1
|
Mi N, Li Z, Zhang X, Gao Y, Wang Y, Liu S, Wang S. Identification of potential immunotherapeutic targets and prognostic biomarkers in Graves' disease using weighted gene co-expression network analysis. Heliyon 2024; 10:e27175. [PMID: 38468967 PMCID: PMC10926144 DOI: 10.1016/j.heliyon.2024.e27175] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Revised: 12/11/2023] [Accepted: 02/26/2024] [Indexed: 03/13/2024] Open
Abstract
Graves' disease (GD) is an autoimmune disorder characterized by hyperthyroidism resulting from autoantibody-induced stimulation of the thyroid gland. Despite recent advancements in understanding GD's pathogenesis, the molecular processes driving disease progression and treatment response remain poorly understood. In this study, we aimed to identify crucial immunogenic factors associated with GD prognosis and immunotherapeutic response. To achieve this, we implemented a comprehensive screening strategy that combined computational immunogenicity-potential scoring with multi-parametric cluster analysis to assess the immunomodulatory genes in GD-related subtypes involving stromal and immune cells. Utilizing weighted gene co-expression network analysis (WGCNA), we identified co-expressed gene modules linked to cellular senescence and immune infiltration in CD4+ and CD8+ GD samples. Additionally, gene set enrichment analysis enabled the identification of hallmark pathways distinguishing high- and low-immune subtypes. Our WGCNA analysis revealed 21 gene co-expression modules comprising 1,541 genes associated with immune infiltration components in various stages of GD, including T cells, M1 and M2 macrophages, NK cells, and Tregs. These genes primarily participated in T cell proliferation through purinergic signaling pathways, particularly neuroactive ligand-receptor interactions, and DNA binding transcription factor activity. Three genes, namely PRSS1, HCRTR1, and P2RY4, exhibited robustness in GD patients across multiple stages and were involved in immune cell infiltration during the late stage of GD (p < 0.05). Importantly, HCRTR1 and P2RY4 emerged as potential prognostic signatures for predicting overall survival in high-immunocore GD patients (p < 0.05). Overall, our study provides novel insights into the molecular mechanisms driving GD progression and highlights potential key immunogens for further investigation. These findings underscore the significance of immune infiltration-related cellular senescence in GD therapy and present promising targets for the development of new immunotherapeutic strategies.
Collapse
Affiliation(s)
- Nianrong Mi
- Department of General Practice, Central Hospital Affiliated to Shandong First Medical University, Jinan, Shandong Province, 250013, China
| | - Zhe Li
- Department of Health Management Center, Central Hospital Affiliated to Shandong First Medical University, Jinan, Shandong Province, 250013, China
| | - Xueling Zhang
- Department of Integrated Chinese and Western Medicine, Central Hospital Affiliated to Shandong First Medical University, Jinan, Shandong Province, 250013, China
| | - Yingjing Gao
- Department of Endocrinology, Shandong First Medical University, Jinan, Shandong Province, 250013, China
| | - Yanan Wang
- Department of Endocrinology, Shandong First Medical University, Jinan, Shandong Province, 250013, China
| | - Siyan Liu
- Department of Endocrinology, Shandong First Medical University, Jinan, Shandong Province, 250013, China
| | - Shaolian Wang
- Department of Integrated Chinese and Western Medicine, Central Hospital Affiliated to Shandong First Medical University, Jinan, Shandong Province, 250013, China
| |
Collapse
|
2
|
Mokhtari M, Khoshbakht S, Ziyaei K, Akbari ME, Moravveji SS. New classifications for quantum bioinformatics: Q-bioinformatics, QCt-bioinformatics, QCg-bioinformatics, and QCr-bioinformatics. Brief Bioinform 2024; 25:bbae074. [PMID: 38446742 PMCID: PMC10939336 DOI: 10.1093/bib/bbae074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Revised: 11/14/2023] [Accepted: 02/07/2021] [Indexed: 03/08/2024] Open
Abstract
Bioinformatics has revolutionized biology and medicine by using computational methods to analyze and interpret biological data. Quantum mechanics has recently emerged as a promising tool for the analysis of biological systems, leading to the development of quantum bioinformatics. This new field employs the principles of quantum mechanics, quantum algorithms, and quantum computing to solve complex problems in molecular biology, drug design, and protein folding. However, the intersection of bioinformatics, biology, and quantum mechanics presents unique challenges. One significant challenge is the possibility of confusion among scientists between quantum bioinformatics and quantum biology, which have similar goals and concepts. Additionally, the diverse calculations in each field make it difficult to establish boundaries and identify purely quantum effects from other factors that may affect biological processes. This review provides an overview of the concepts of quantum biology and quantum mechanics and their intersection in quantum bioinformatics. We examine the challenges and unique features of this field and propose a classification of quantum bioinformatics to promote interdisciplinary collaboration and accelerate progress. By unlocking the full potential of quantum bioinformatics, this review aims to contribute to our understanding of quantum mechanics in biological systems.
Collapse
Affiliation(s)
- Majid Mokhtari
- Department of Bioinformatics, Kish International Campus, University of Tehran, Kish Island, Iran
| | - Samane Khoshbakht
- Department of Bioinformatics, Kish International Campus, University of Tehran, Kish Island, Iran
- Duke Molecular Physiology Institute, Duke University School of Medicine-Cardiology, Durham, NC, 27701, USA
| | - Kobra Ziyaei
- Department of Fisheries, Faculty of Natural Resources, University of Tehran, Karaj, Iran
| | | | - Sayyed Sajjad Moravveji
- Department of Bioinformatics, Kish International Campus, University of Tehran, Kish Island, Iran
| |
Collapse
|
3
|
Chen Y, Zhang Z, Hu X, Zhang Y. Epigenetic characterization of sarcopenia-associated genes based on machine learning and network screening. Eur J Med Res 2024; 29:54. [PMID: 38229116 PMCID: PMC10790491 DOI: 10.1186/s40001-023-01603-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Accepted: 12/17/2023] [Indexed: 01/18/2024] Open
Abstract
To screen characteristic genes related to sarcopenia by bioinformatics and machine learning, and to verify the accuracy of characteristic genes in the diagnosis of sarcopenia. Download myopia-related data sets from geo public database, find the differential genes through R language limma package after merging, STRING database to build protein interaction network, and do Go analysis and GSEA analysis to understand the functions and molecular signal pathways that may be affected by the differential genes. Further screen the characteristic genes through LASSO and SVM-RFE machine algorithms, make the ROC curve of the characteristic genes, and obtain the AUC value. 10 differential genes were obtained from the data set, including 7 upregulated genes and 3 downregulated genes. Eight characteristic genes were screened by a machine learning algorithm, and the AUC value of characteristic genes exceeded 0.7. In patients with sarcopenia, the expression of TPPP3, C1QA, LGR5, MYH8, and CDKN1A genes are upregulated, and the expression of SLC38A1, SERPINA5, and HOXB2 genes are downregulated. The above genes have high accuracy in the diagnosis of sarcopenia. The research results provide new ideas for the diagnosis and mechanism research of sarcopenia.
Collapse
Affiliation(s)
- Yong Chen
- Key Laboratory of Renal Diseases Occurrence and Intervention of Hubei Province, Medical College, Hubei Polytechnic University, Huangshi, 435003, China
| | - Zhenyu Zhang
- Shenzhen Qihuang Guoyi Hanfang Innovation Research Center, Shenzhen, 518046, China
| | - Xiaolan Hu
- Huangshi Central Hospital, Affiliated Hospital of Hubei Polytechnic University, Huangshi, 435099, China
| | - Yang Zhang
- Pingshan District People's Hospital of Shenzhen, Pingshan General Hospital of Southern Medical University, No. 19 Renmin Street, Pingshan Street, Pingshan District, Shenzhen, 518118, Guangdong, China.
| |
Collapse
|
4
|
Rauluseviciute I, Riudavets-Puig R, Blanc-Mathieu R, Castro-Mondragon J, Ferenc K, Kumar V, Lemma RB, Lucas J, Chèneby J, Baranasic D, Khan A, Fornes O, Gundersen S, Johansen M, Hovig E, Lenhard B, Sandelin A, Wasserman W, Parcy F, Mathelier A. JASPAR 2024: 20th anniversary of the open-access database of transcription factor binding profiles. Nucleic Acids Res 2024; 52:D174-D182. [PMID: 37962376 PMCID: PMC10767809 DOI: 10.1093/nar/gkad1059] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 10/20/2023] [Accepted: 10/31/2023] [Indexed: 11/15/2023] Open
Abstract
JASPAR (https://jaspar.elixir.no/) is a widely-used open-access database presenting manually curated high-quality and non-redundant DNA-binding profiles for transcription factors (TFs) across taxa. In this 10th release and 20th-anniversary update, the CORE collection has expanded with 329 new profiles. We updated three existing profiles and provided orthogonal support for 72 profiles from the previous release's UNVALIDATED collection. Altogether, the JASPAR 2024 update provides a 20% increase in CORE profiles from the previous release. A trimming algorithm enhanced profiles by removing low information content flanking base pairs, which were likely uninformative (within the capacity of the PFM models) for TFBS predictions and modelling TF-DNA interactions. This release includes enhanced metadata, featuring a refined classification for plant TFs' structural DNA-binding domains. The new JASPAR collections prompt updates to the genomic tracks of predicted TF binding sites (TFBSs) in 8 organisms, with human and mouse tracks available as native tracks in the UCSC Genome browser. All data are available through the JASPAR web interface and programmatically through its API and the updated Bioconductor and pyJASPAR packages. Finally, a new TFBS extraction tool enables users to retrieve predicted JASPAR TFBSs intersecting their genomic regions of interest.
Collapse
Affiliation(s)
- Ieva Rauluseviciute
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway
| | - Rafael Riudavets-Puig
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway
| | - Romain Blanc-Mathieu
- Laboratoire Physiologie Cellulaire et Végétale, Univ. Grenoble Alpes, CNRS, CEA, INRAE, IRIG-DBSCI-LPCV, 17 avenue des martyrs, F-38054, Grenoble, France
| | - Jaime A Castro-Mondragon
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway
| | - Katalin Ferenc
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway
| | - Vipin Kumar
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway
| | - Roza Berhanu Lemma
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway
| | - Jérémy Lucas
- Laboratoire Physiologie Cellulaire et Végétale, Univ. Grenoble Alpes, CNRS, CEA, INRAE, IRIG-DBSCI-LPCV, 17 avenue des martyrs, F-38054, Grenoble, France
| | - Jeanne Chèneby
- Center for Bioinformatics, Department of Informatics, University of Oslo, Oslo, Norway
| | - Damir Baranasic
- MRC London Institute of Medical Sciences, Du Cane Road, London W12 0NN, UK
- Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, Hammersmith Hospital Campus, Du Cane Road, London W12 0NN, UK
- Division of Electronics, Ruđer Bošković Institute, Bijenička cesta, 10000 Zagreb, Croatia
| | - Aziz Khan
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway
- Stanford Cancer Institute, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Oriol Fornes
- Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, BC Children's Hospital Research Institute, University of British Columbia, 950 W 28th Ave, Vancouver, BC V5Z 4H4, Canada
| | - Sveinung Gundersen
- Center for Bioinformatics, Department of Informatics, University of Oslo, Oslo, Norway
| | - Morten Johansen
- Center for Bioinformatics, Department of Informatics, University of Oslo, Oslo, Norway
| | - Eivind Hovig
- Center for Bioinformatics, Department of Informatics, University of Oslo, Oslo, Norway
- Department of Tumor Biology, Institute for Cancer Research, Oslo University Hospital, 0424 Oslo, Norway
| | - Boris Lenhard
- MRC London Institute of Medical Sciences, Du Cane Road, London W12 0NN, UK
- Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, Hammersmith Hospital Campus, Du Cane Road, London W12 0NN, UK
| | - Albin Sandelin
- Department of Biology and Biotech Research and Innovation Centre, University of Copenhagen, Ole Maaløes Vej 5, DK2200 Copenhagen N, Denmark
| | - Wyeth W Wasserman
- Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, BC Children's Hospital Research Institute, University of British Columbia, 950 W 28th Ave, Vancouver, BC V5Z 4H4, Canada
| | - François Parcy
- Laboratoire Physiologie Cellulaire et Végétale, Univ. Grenoble Alpes, CNRS, CEA, INRAE, IRIG-DBSCI-LPCV, 17 avenue des martyrs, F-38054, Grenoble, France
| | - Anthony Mathelier
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway
- Center for Bioinformatics, Department of Informatics, University of Oslo, Oslo, Norway
- Department of Medical Genetics, Institute of Clinical Medicine, University of Oslo and Oslo University Hospital, Oslo, Norway
| |
Collapse
|
5
|
Luo X, Wang R, Zhang X, Wen X, Deng S, Xie W. Identification CCL2,CXCR2,S100A9 of the immune-related gene markers and immune infiltration characteristics of inflammatory bowel disease and heart failure via bioinformatics analysis and machine learning. Front Cardiovasc Med 2023; 10:1268675. [PMID: 38034382 PMCID: PMC10687362 DOI: 10.3389/fcvm.2023.1268675] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Accepted: 11/02/2023] [Indexed: 12/02/2023] Open
Abstract
Background Recently, heart failure (HF) and inflammatory bowel disease (IBD) have been considered to be related diseases with increasing incidence rates; both diseases are related to immunity. This study aims to analyze and identify immune-related gene (IRG) markers of HF and IBD through bioinformatics and machine learning (ML) methods and to explore their immune infiltration characteristics. Methods This study used gene expressiondata (GSE120895, GSE21610, GSE4183) from the Gene Expression Omnibus (GEO) database to screen differentially expressed genes (DEGs) and compare them with IRGs from the ImmPort database to obtain differentially expressed immune-related genes (DIRGs). Functional enrichment analysis of IRGs was performed using Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG). Subsequently, three machine models and protein-protein interactions (PPIs) were established to identify diagnostic biomarkers. The receiver operating characteristic (ROC) curves were applied to evaluate the diagnostic value of the candidate biomarkersin the validation set (GSE1145, GSE36807) and obtain their correlations with immune cells through the Spearman algorithm. Finally, the CIBERSORT algorithm was used to evaluate the immune cell infiltration of the two diseases. Results Thirty-four DIRGs were screened and GO and KEGG analysis results showed that these genes are mainly related to inflammatory and immune responses. CCL2, CXCR2 and S100A9 were identified as biomarkers.The immune correlation results indicated in both diseases that CCL2 is positively correlated with mast cell activation, CXCR2 is positively correlated with neutrophils and S100A9 is positively correlated with neutrophils and mast cell activation. Analysis of immune characteristics showed that macrophages M2, macrophages M0 and neutrophils were present in both diseases. Conclusions CCL2, CXCR2 and S100A9 are promising biomarkers that will become potential immunogenetic biomarkers for diagnosing comorbidities of HF and IBD. macrophages M2, macrophages M0, neutrophil-mediated inflammation and immune regulation play important roles in the development of HF and IBD and may become diagnostic and therapeutic targets.
Collapse
Affiliation(s)
- Xu Luo
- College of Clinical Medicine, Chengdu University of Traditional Chinese Medicine, Chengdu, China
| | - Rui Wang
- College of Clinical Medicine, Chengdu University of Traditional Chinese Medicine, Chengdu, China
| | - Xin Zhang
- College of Clinical Medicine, Chengdu University of Traditional Chinese Medicine, Chengdu, China
| | - Xin Wen
- College of Clinical Medicine, Chengdu University of Traditional Chinese Medicine, Chengdu, China
| | - Siwei Deng
- College of Clinical Medicine, Chengdu University of Traditional Chinese Medicine, Chengdu, China
| | - Wen Xie
- College of Clinical Medicine, Chengdu University of Traditional Chinese Medicine, Chengdu, China
- Department of Cardiology, Hospital of Chengdu University of Traditional Chinese Medicine, Chengdu, China
| |
Collapse
|
6
|
Ha AD, Aylward FO. Automated classification of giant virus genomes using a random forest model built on trademark protein families. bioRxiv 2023:2023.11.10.566645. [PMID: 38014039 PMCID: PMC10680617 DOI: 10.1101/2023.11.10.566645] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
Viruses of the phylum Nucleocytoviricota , often referred to as "giant viruses," are prevalent in various environments around the globe and play significant roles in shaping eukaryotic diversity and activities in global ecosystems. Given the extensive phylogenetic diversity within this viral group and the highly complex composition of their genomes, taxonomic classification of giant viruses, particularly incomplete metagenome-assembled genomes (MAGs) can present a considerable challenge. Here we developed TIGTOG ( T axonomic Information of G iant viruses using T rademark O rthologous G roups), a machine learning-based approach to predict the taxonomic classification of novel giant virus MAGs based on profiles of protein family content. We applied a random forest algorithm to a training set of 1,531 quality-checked, phylogenetically diverse Nucleocytoviricota genomes using pre-selected sets of giant virus orthologous groups (GVOGs). The classification models were predictive of viral taxonomic assignments with a cross-validation accuracy of 99.6% to the order level and 97.3% to the family level. We found that no individual GVOGs or genome features significantly influenced the algorithm's performance or the models' predictions, indicating that classification predictions were based on a comprehensive genomic signature, which reduced the necessity of a fixed set of marker genes for taxonomic assigning purposes. Our classification models were validated with an independent test set of 823 giant virus genomes with varied genomic completeness and taxonomy and demonstrated an accuracy of 98.6% and 95.9% to the order and family level, respectively. Our results indicate that protein family profiles can be used to accurately classify large DNA viruses at different taxonomic levels and provide a fast and accurate method for the classification of giant viruses. This approach could easily be adapted to other viral groups.
Collapse
|
7
|
Gomes RAL, Zerbini FM. ConCreT, a 2D convolutional neural network for taxonomic classification applied to viruses in the phylum Cressdnaviricota. J Virol Methods 2023; 320:114789. [PMID: 37536450 DOI: 10.1016/j.jviromet.2023.114789] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2023] [Revised: 07/19/2023] [Accepted: 07/31/2023] [Indexed: 08/05/2023]
Abstract
Taxonomic assignments allow scientists to communicate better with each other. In virology, taxonomy is continually improving towards a more precise and comprehensive framework. With the huge numbers of new viruses being described in metagenomic studies, automated taxonomy tools are urgently needed. A number of such tools have been proposed, and those applying machine learning (ML), mainly in the deep learning branch, stand out with accurate results. Still, there is a demand for tools that are less computationally intensive and that can classify viruses down to the ranks of genus and species. Cressdnaviruses are good subjects for testing such tools, due to their small, circular genomes and the existence of several families and genera with a highly imbalanced number of species. We developed a 2D convolutional neural network for virus taxonomy and tested it for classification of viruses from the phylum Cressdnaviricota. We obtained >98 % accuracy in the final pipeline tested, which we named ConCreT (Convolutional Neural Network for Cressdnavirus Taxonomy). The mixture of augmentation for more imbalanced groups with no augmentation for more balanced ones achieved the best score in the final test.
Collapse
Affiliation(s)
- Ruither A L Gomes
- Dep. de Fitopatologia, Universidade Federal de Viçosa, Viçosa, MG 36570-900, Brazil; National Institute for Science and Technology on Plant-Pest Interactions, Universidade Federal de Viçosa, Viçosa, MG 36570-900, Brazil
| | - F Murilo Zerbini
- Dep. de Fitopatologia, Universidade Federal de Viçosa, Viçosa, MG 36570-900, Brazil; National Institute for Science and Technology on Plant-Pest Interactions, Universidade Federal de Viçosa, Viçosa, MG 36570-900, Brazil.
| |
Collapse
|
8
|
Tremmel R, Pirmann S, Zhou Y, Lauschke VM. Translating pharmacogenomic sequencing data into drug response predictions-How to interpret variants of unknown significance. Br J Clin Pharmacol 2023. [PMID: 37759374 DOI: 10.1111/bcp.15915] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Revised: 09/20/2023] [Accepted: 09/22/2023] [Indexed: 09/29/2023] Open
Abstract
The rapid development of sequencing technologies during the past 20 years has provided a variety of methods and tools to interrogate human genomic variations at the population level. Pharmacogenes are well known to be highly polymorphic and a plethora of pharmacogenomic variants has been identified in population sequencing data. However, so far only a small number of these variants have been functionally characterized regarding their impact on drug efficacy and toxicity and the significance of the vast majority remains unknown. It is therefore of high importance to develop tools and frameworks to accurately infer the effects of pharmacogenomic variants and, eventually, aggregate the effect of individual variations into personalized drug response predictions. To address this challenge, we here first describe the technological advances, including sequencing methods and accompanying bioinformatic processing pipelines that have enabled reliable variant identification. Subsequently, we highlight advances in computational algorithms for pharmacogenomic variant interpretation and discuss the added value of emerging strategies, such as machine learning and the integrative use of omics techniques that have the potential to further contribute to the refinement of personalized pharmacological response predictions. Lastly, we provide an overview of experimental and clinical approaches to validate in silico predictions. We conclude that the iterative feedback between computational predictions and experimental validations is likely to rapidly improve the accuracy of pharmacogenomic prediction models, which might soon allow for an incorporation of the entire pharmacogenetic profile into personalized response predictions.
Collapse
Affiliation(s)
- Roman Tremmel
- Dr Margarete Fischer-Bosch Institute of Clinical Pharmacology, Stuttgart, Germany
- University of Tübingen, Tübingen, Germany
| | - Sebastian Pirmann
- Computational Oncology Group, Molecular Precision Oncology Program, National Center for Tumor Diseases (NCT) Heidelberg and German Cancer Research Center (DKFZ), Heidelberg, Germany
- Helmholtz Information and Data Science School for Health, Karlsruhe/Heidelberg, Germany
- Faculty of Biosciences, Heidelberg University, Heidelberg, Germany
| | - Yitian Zhou
- Department of Physiology and Pharmacology, Karolinska Institutet, Stockholm, Sweden
| | - Volker M Lauschke
- Dr Margarete Fischer-Bosch Institute of Clinical Pharmacology, Stuttgart, Germany
- University of Tübingen, Tübingen, Germany
- Department of Physiology and Pharmacology, Karolinska Institutet, Stockholm, Sweden
| |
Collapse
|
9
|
Raslan MA, Raslan SA, Shehata EM, Mahmoud AS, Sabri NA. Advances in the Applications of Bioinformatics and Chemoinformatics. Pharmaceuticals (Basel) 2023; 16:1050. [PMID: 37513961 PMCID: PMC10384252 DOI: 10.3390/ph16071050] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Revised: 07/19/2023] [Accepted: 07/20/2023] [Indexed: 07/30/2023] Open
Abstract
Chemoinformatics involves integrating the principles of physical chemistry with computer-based and information science methodologies, commonly referred to as "in silico techniques", in order to address a wide range of descriptive and prescriptive chemistry issues, including applications to biology, drug discovery, and related molecular areas. On the other hand, the incorporation of machine learning has been considered of high importance in the field of drug design, enabling the extraction of chemical data from enormous compound databases to develop drugs endowed with significant biological features. The present review discusses the field of cheminformatics and proposes the use of virtual chemical libraries in virtual screening methods to increase the probability of discovering novel hit chemicals. The virtual libraries address the need to increase the quality of the compounds as well as discover promising ones. On the other hand, various applications of bioinformatics in disease classification, diagnosis, and identification of multidrug-resistant organisms were discussed. The use of ensemble models and brute-force feature selection methodology has resulted in high accuracy rates for heart disease and COVID-19 diagnosis, along with the role of special formulations for targeting meningitis and Alzheimer's disease. Additionally, the correlation between genomic variations and disease states such as obesity and chronic progressive external ophthalmoplegia, the investigation of the antibacterial activity of pyrazole and benzimidazole-based compounds against resistant microorganisms, and its applications in chemoinformatics for the prediction of drug properties and toxicity-all the previously mentioned-were presented in the current review.
Collapse
Affiliation(s)
| | | | | | - Amr S Mahmoud
- Department of Obstetrics and Gynecology, Faculty of Medicine, Ain Shams University, Cairo P.O. Box 11566, Egypt
| | - Nagwa A Sabri
- Department of Clinical Pharmacy, Faculty of Pharmacy, Ain Shams University, Cairo P.O. Box 11566, Egypt
| |
Collapse
|
10
|
Choi SR, Lee M. Transformer Architecture and Attention Mechanisms in Genome Data Analysis: A Comprehensive Review. Biology (Basel) 2023; 12:1033. [PMID: 37508462 PMCID: PMC10376273 DOI: 10.3390/biology12071033] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Revised: 07/18/2023] [Accepted: 07/21/2023] [Indexed: 07/30/2023]
Abstract
The emergence and rapid development of deep learning, specifically transformer-based architectures and attention mechanisms, have had transformative implications across several domains, including bioinformatics and genome data analysis. The analogous nature of genome sequences to language texts has enabled the application of techniques that have exhibited success in fields ranging from natural language processing to genomic data. This review provides a comprehensive analysis of the most recent advancements in the application of transformer architectures and attention mechanisms to genome and transcriptome data. The focus of this review is on the critical evaluation of these techniques, discussing their advantages and limitations in the context of genome data analysis. With the swift pace of development in deep learning methodologies, it becomes vital to continually assess and reflect on the current standing and future direction of the research. Therefore, this review aims to serve as a timely resource for both seasoned researchers and newcomers, offering a panoramic view of the recent advancements and elucidating the state-of-the-art applications in the field. Furthermore, this review paper serves to highlight potential areas of future investigation by critically evaluating studies from 2019 to 2023, thereby acting as a stepping-stone for further research endeavors.
Collapse
Affiliation(s)
- Sanghyuk Roy Choi
- School of Electrical and Electronics Engineering, Chung-Ang University, Seoul 06974, Republic of Korea
| | - Minhyeok Lee
- School of Electrical and Electronics Engineering, Chung-Ang University, Seoul 06974, Republic of Korea
| |
Collapse
|
11
|
Aydin SG, Bilge HŞ. FPGA Implementation of Image Registration Using Accelerated CNN. Sensors (Basel) 2023; 23:6590. [PMID: 37514883 PMCID: PMC10386551 DOI: 10.3390/s23146590] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Revised: 07/17/2023] [Accepted: 07/19/2023] [Indexed: 07/30/2023]
Abstract
BACKGROUND Accurate and fast image registration (IR) is critical during surgical interventions where the ultrasound (US) modality is used for image-guided intervention. Convolutional neural network (CNN)-based IR methods have resulted in applications that respond faster than traditional iterative IR methods. However, general-purpose processors are unable to operate at the maximum speed possible for real-time CNN algorithms. Due to its reconfigurable structure and low power consumption, the field programmable gate array (FPGA) has gained prominence for accelerating the inference phase of CNN applications. METHODS This study proposes an FPGA-based ultrasound IR CNN (FUIR-CNN) to regress three rigid registration parameters from image pairs. To speed up the estimation process, the proposed design makes use of fixed-point data and parallel operations carried out by unrolling and pipelining techniques. Experiments were performed on three US datasets in real time using the xc7z020, and the xcku5p was also used during implementation. RESULTS The FUIR-CNN produced results for the inference phase 139 times faster than the software-based network while retaining a negligible drop in regression performance of under 200 MHz clock frequency. CONCLUSIONS Comprehensive experimental results demonstrate that the proposed end-to-end FPGA-based accelerated CNN achieves a negligible loss, a high speed for registration parameters, less power when compared to the CPU, and the potential for real-time medical imaging.
Collapse
Affiliation(s)
- Seda Guzel Aydin
- Department of Electrical and Electronics Engineering, Bingol University, Bingol 12000, Turkey
| | - Hasan Şakir Bilge
- Biomedical Calibration and Research Center (BIYOKAM), Gazi University, Ankara 06560, Turkey
| |
Collapse
|
12
|
Szulc NA, Mackiewicz Z, Bujnicki JM, Stefaniak F. Structural interaction fingerprints and machine learning for predicting and explaining binding of small molecule ligands to RNA. Brief Bioinform 2023; 24:bbad187. [PMID: 37204195 DOI: 10.1093/bib/bbad187] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2022] [Revised: 04/07/2023] [Accepted: 04/25/2023] [Indexed: 05/20/2023] Open
Abstract
Ribonucleic acids (RNAs) play crucial roles in living organisms and some of them, such as bacterial ribosomes and precursor messenger RNA, are targets of small molecule drugs, whereas others, e.g. bacterial riboswitches or viral RNA motifs are considered as potential therapeutic targets. Thus, the continuous discovery of new functional RNA increases the demand for developing compounds targeting them and for methods for analyzing RNA-small molecule interactions. We recently developed fingeRNAt-a software for detecting non-covalent bonds formed within complexes of nucleic acids with different types of ligands. The program detects several non-covalent interactions and encodes them as structural interaction fingerprint (SIFt). Here, we present the application of SIFts accompanied by machine learning methods for binding prediction of small molecules to RNA. We show that SIFt-based models outperform the classic, general-purpose scoring functions in virtual screening. We also employed Explainable Artificial Intelligence (XAI)-the SHapley Additive exPlanations, Local Interpretable Model-agnostic Explanations and other methods to help understand the decision-making process behind the predictive models. We conducted a case study in which we applied XAI on a predictive model of ligand binding to human immunodeficiency virus type 1 trans-activation response element RNA to distinguish between residues and interaction types important for binding. We also used XAI to indicate whether an interaction has a positive or negative effect on binding prediction and to quantify its impact. Our results obtained using all XAI methods were consistent with the literature data, demonstrating the utility and importance of XAI in medicinal chemistry and bioinformatics.
Collapse
Affiliation(s)
- Natalia A Szulc
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology in Warsaw, 4 Ks. Trojdena Str, 02-109 Warsaw, Poland
- Laboratory of Protein Metabolism, International Institute of Molecular and Cell Biology in Warsaw, 4 Ks. Trojdena Str, 02-109 Warsaw, Poland
| | - Zuzanna Mackiewicz
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology in Warsaw, 4 Ks. Trojdena Str, 02-109 Warsaw, Poland
- Laboratory of RNA Biology - ERA Chairs Group, International Institute of Molecular and Cell Biology in Warsaw, 4 Ks. Trojdena Str, 02-109 Warsaw, Poland
| | - Janusz M Bujnicki
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology in Warsaw, 4 Ks. Trojdena Str, 02-109 Warsaw, Poland
| | - Filip Stefaniak
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology in Warsaw, 4 Ks. Trojdena Str, 02-109 Warsaw, Poland
| |
Collapse
|
13
|
Jin B, Cheng X, Fei G, Sang S, Zhong C. Identification of diagnostic biomarkers in Alzheimer's disease by integrated bioinformatic analysis and machine learning strategies. Front Aging Neurosci 2023; 15:1169620. [PMID: 37434738 PMCID: PMC10331604 DOI: 10.3389/fnagi.2023.1169620] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2023] [Accepted: 06/08/2023] [Indexed: 07/13/2023] Open
Abstract
Background Alzheimer's disease (AD) is the most prevalent form of dementia, and is becoming one of the most burdening and lethal diseases. More useful biomarkers for diagnosing AD and reflecting the disease progression are in need and of significance. Methods The integrated bioinformatic analysis combined with machine-learning strategies was applied for exploring crucial functional pathways and identifying diagnostic biomarkers of AD. Four datasets (GSE5281, GSE131617, GSE48350, and GSE84422) with samples of AD frontal cortex are integrated as experimental datasets, and another two datasets (GSE33000 and GSE44772) with samples of AD frontal cortex were used to perform validation analyses. Functional Correlation enrichment analyses were conducted based on Gene ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG), and the Reactome database to reveal AD-associated biological functions and key pathways. Four models were employed to screen the potential diagnostic biomarkers, including one bioinformatic analysis of Weighted gene co-expression network analysis (WGCNA)and three machine-learning algorithms: Least absolute shrinkage and selection operator (LASSO), support vector machine-recursive feature elimination (SVM-RFE) and random forest (RF) analysis. The correlation analysis was performed to explore the correlation between the identified biomarkers with CDR scores and Braak staging. Results The pathways of the immune response and oxidative stress were identified as playing a crucial role during AD. Thioredoxin interacting protein (TXNIP), early growth response 1 (EGR1), and insulin-like growth factor binding protein 5 (IGFBP5) were screened as diagnostic markers of AD. The diagnostic efficacy of TXNIP, EGR1, and IGFBP5 was validated with corresponding AUCs of 0.857, 0.888, and 0.856 in dataset GSE33000, 0.867, 0.909, and 0.841 in dataset GSE44770. And the AUCs of the combination of these three biomarkers as a diagnostic tool for AD were 0.954 and 0.938 in the two verification datasets. Conclusion The pathways of immune response and oxidative stress can play a crucial role in the pathogenesis of AD. TXNIP, EGR1, and IGFBP5 are useful biomarkers for diagnosing AD and their mRNA level may reflect the development of the disease by correlation with the CDR scores and Breaking staging.
Collapse
Affiliation(s)
- Boru Jin
- Department of Neurology, Zhongshan Hospital, Fudan University, Shanghai, China
- Collaborative Innovation Center for Brain Science, Fudan University, Shanghai, China
| | - Xiaoqin Cheng
- Department of Neurology, Zhongshan Hospital, Fudan University, Shanghai, China
- Collaborative Innovation Center for Brain Science, Fudan University, Shanghai, China
| | - Guoqiang Fei
- Department of Neurology, Zhongshan Hospital, Fudan University, Shanghai, China
- Collaborative Innovation Center for Brain Science, Fudan University, Shanghai, China
| | - Shaoming Sang
- Shanghai Raising Pharmaceutical Technology Co., Ltd.Shanghai, China
| | - Chunjiu Zhong
- Department of Neurology, Zhongshan Hospital, Fudan University, Shanghai, China
- Collaborative Innovation Center for Brain Science, Fudan University, Shanghai, China
| |
Collapse
|
14
|
Guzman NA, Guzman DE, Blanc T. Advancements in portable instruments based on affinity-capture-migration and affinity-capture-separation for use in clinical testing and life science applications. J Chromatogr A 2023; 1704:464109. [PMID: 37315445 DOI: 10.1016/j.chroma.2023.464109] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2023] [Revised: 05/23/2023] [Accepted: 05/25/2023] [Indexed: 06/16/2023]
Abstract
The shift from testing at centralized diagnostic laboratories to remote locations is being driven by the development of point-of-care (POC) instruments and represents a transformative moment in medicine. POC instruments address the need for rapid results that can inform faster therapeutic decisions and interventions. These instruments are especially valuable in the field, such as in an ambulance, or in remote and rural locations. The development of telehealth, enabled by advancements in digital technologies like smartphones and cloud computing, is also aiding in this evolution, allowing medical professionals to provide care remotely, potentially reducing healthcare costs and improving patient longevity. One notable POC device is the lateral flow immunoassay (LFIA), which played a major role in addressing the COVID-19 pandemic due to its ease of use, rapid analysis time, and low cost. However, LFIA tests exhibit relatively low analytical sensitivity and provide semi-quantitative information, indicating either a positive, negative, or inconclusive result, which can be attributed to its one-dimensional format. Immunoaffinity capillary electrophoresis (IACE), on the other hand, offers a two-dimensional format that includes an affinity-capture step of one or more matrix constituents followed by release and electrophoretic separation. The method provides greater analytical sensitivity, and quantitative information, thereby reducing the rate of false positives, false negatives, and inconclusive results. Combining LFIA and IACE technologies can thus provide an effective and economical solution for screening, confirming results, and monitoring patient progress, representing a key strategy in advancing diagnostics in healthcare.
Collapse
Affiliation(s)
- Norberto A Guzman
- Princeton Biochemicals, Inc., Princeton, NJ 08543, United States of America.
| | - Daniel E Guzman
- Princeton Biochemicals, Inc., Princeton, NJ 08543, United States of America; Columbia University Irving Medical Center, New York, NY 10032, United States of America
| | - Timothy Blanc
- Eli Lilly and Company, Branchburg, NJ 08876, United States of America
| |
Collapse
|
15
|
Hashizume T, Ozawa Y, Ying BW. Employing active learning in the optimization of culture medium for mammalian cells. NPJ Syst Biol Appl 2023; 9:20. [PMID: 37253825 DOI: 10.1038/s41540-023-00284-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2023] [Accepted: 05/18/2023] [Indexed: 06/01/2023] Open
Abstract
Medium optimization is a crucial step during cell culture for biopharmaceutics and regenerative medicine; however, this step remains challenging, as both media and cells are highly complex systems. Here, we addressed this issue by employing active learning. Specifically, we introduced machine learning to cell culture experiments to optimize culture medium. The cell line HeLa-S3 and the gradient-boosting decision tree algorithm were used to find optimized media as pilot studies. To acquire the training data, cell culture was performed in a large variety of medium combinations. The cellular NAD(P)H abundance, represented as A450, was used to indicate the goodness of culture media. In active learning, regular and time-saving modes were developed using culture data at 168 h and 96 h, respectively. Both modes successfully fine-tuned 29 components to generate a medium for improved cell culture. Intriguingly, the two modes provided different predictions for the concentrations of vitamins and amino acids, and a significant decrease was commonly predicted for fetal bovine serum (FBS) compared to the commercial medium. In addition, active learning-assisted medium optimization significantly increased the cellular concentration of NAD(P)H, an active chemical with a constant abundance in living cells. Our study demonstrated the efficiency and practicality of active learning for medium optimization and provided valuable information for employing machine learning technology in cell biology experiments.
Collapse
Affiliation(s)
- Takamasa Hashizume
- School of Life and Environmental Sciences, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, 305-8572, Ibaraki, Japan
| | - Yuki Ozawa
- School of Life and Environmental Sciences, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, 305-8572, Ibaraki, Japan
| | - Bei-Wen Ying
- School of Life and Environmental Sciences, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, 305-8572, Ibaraki, Japan.
| |
Collapse
|
16
|
Rescifina A. Progress of the "Molecular Informatics" Section in 2022. Int J Mol Sci 2023; 24:ijms24119442. [PMID: 37298393 DOI: 10.3390/ijms24119442] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Accepted: 05/19/2023] [Indexed: 06/12/2023] Open
Abstract
This is the first Editorial of the "Molecular Informatics" Section (MIS) of the International Journal of Molecular Sciences (IJMS), which was created towards the end of 2018 (the first article was submitted on 27 September 2018) and has experienced significant growth from 2018 to now [...].
Collapse
Affiliation(s)
- Antonio Rescifina
- Department of Drug and Health Sciences, University of Catania, Viale Andrea Doria 6, 95125 Catania, Italy
| |
Collapse
|
17
|
Varshney N, Mishra AK. Deep Learning in Phosphoproteomics: Methods and Application in Cancer Drug Discovery. Proteomes 2023; 11:proteomes11020016. [PMID: 37218921 DOI: 10.3390/proteomes11020016] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Revised: 04/24/2023] [Accepted: 04/25/2023] [Indexed: 05/24/2023] Open
Abstract
Protein phosphorylation is a key post-translational modification (PTM) that is a central regulatory mechanism of many cellular signaling pathways. Several protein kinases and phosphatases precisely control this biochemical process. Defects in the functions of these proteins have been implicated in many diseases, including cancer. Mass spectrometry (MS)-based analysis of biological samples provides in-depth coverage of phosphoproteome. A large amount of MS data available in public repositories has unveiled big data in the field of phosphoproteomics. To address the challenges associated with handling large data and expanding confidence in phosphorylation site prediction, the development of many computational algorithms and machine learning-based approaches have gained momentum in recent years. Together, the emergence of experimental methods with high resolution and sensitivity and data mining algorithms has provided robust analytical platforms for quantitative proteomics. In this review, we compile a comprehensive collection of bioinformatic resources used for the prediction of phosphorylation sites, and their potential therapeutic applications in the context of cancer.
Collapse
Affiliation(s)
- Neha Varshney
- Division of Biological Sciences, Department of Cellular and Molecular Medicine, University of California, San Diego, CA 93093, USA
- Ludwig Institute for Cancer Research, La Jolla, CA 92093, USA
| | - Abhinava K Mishra
- Molecular, Cellular and Developmental Biology Department, University of California, Santa Barbara, CA 93106, USA
| |
Collapse
|
18
|
Elhadary M, Elsabagh AA, Ferih K, Elsayed B, Elshoeibi AM, Kaddoura R, Akiki S, Ahmed K, Yassin M. Applications of Machine Learning in Chronic Myeloid Leukemia. Diagnostics (Basel) 2023; 13:diagnostics13071330. [PMID: 37046547 PMCID: PMC10093579 DOI: 10.3390/diagnostics13071330] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Revised: 03/11/2023] [Accepted: 03/15/2023] [Indexed: 04/14/2023] Open
Abstract
Chronic myeloid leukemia (CML) is a myeloproliferative neoplasm characterized by dysregulated growth and the proliferation of myeloid cells in the bone marrow caused by the BCR-ABL1 fusion gene. Clinically, CML demonstrates an increased production of mature and maturing granulocytes, mainly neutrophils. When a patient is suspected to have CML, peripheral blood smears and bone marrow biopsies may be manually examined by a hematologist. However, confirmatory testing for the BCR-ABL1 gene is still needed to confirm the diagnosis. Despite tyrosine kinase inhibitors (TKIs) being the mainstay of treatment for patients with CML, different agents should be used in different patients given their stage of disease and comorbidities. Moreover, some patients do not respond well to certain agents and some need more aggressive courses of therapy. Given the innovations and development that machine learning (ML) and artificial intelligence (AI) have undergone over the years, multiple models and algorithms have been put forward to help in the assessment and treatment of CML. In this review, we summarize the recent studies utilizing ML algorithms in patients with CML. The search was conducted on the PubMed/Medline and Embase databases and yielded 66 full-text articles and abstracts, out of which 11 studies were included after screening against the inclusion criteria. The studies included show potential for the clinical implementation of ML models in the diagnosis, risk assessment, and treatment processes of patients with CML.
Collapse
Affiliation(s)
- Mohamed Elhadary
- College of Medicine, QU Health, Qatar University, Doha 2713, Qatar
| | | | - Khaled Ferih
- College of Medicine, QU Health, Qatar University, Doha 2713, Qatar
| | - Basel Elsayed
- College of Medicine, QU Health, Qatar University, Doha 2713, Qatar
| | | | - Rasha Kaddoura
- Pharmacy Department, Heart Hospital, Hamad Medical Corporation (HMC), Doha 3050, Qatar
| | - Susanna Akiki
- Diagnostic Genomic Division, Hamad Medical Corporation (HMC), Doha 3050, Qatar
| | - Khalid Ahmed
- Department of Hematology, National Center for Cancer Care and Research (NCCCR), Hamad Medical Corporation (HMC), Doha 3050, Qatar
| | - Mohamed Yassin
- Hematology Section, Medical Oncology, National Center for Cancer Care and Research (NCCCR), Hamad Medical Corporation (HMC), Doha 3050, Qatar
| |
Collapse
|
19
|
Sun Z, Lin J, Zhang T, Sun X, Wang T, Duan J, Yao K. Combining bioinformatics and machine learning to identify common mechanisms and biomarkers of chronic obstructive pulmonary disease and atrial fibrillation. Front Cardiovasc Med 2023; 10:1121102. [PMID: 37057099 PMCID: PMC10086368 DOI: 10.3389/fcvm.2023.1121102] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2022] [Accepted: 03/14/2023] [Indexed: 03/30/2023] Open
Abstract
BackgroundPatients with chronic obstructive pulmonary disease (COPD) often present with atrial fibrillation (AF), but the common pathophysiological mechanisms between the two are unclear. This study aimed to investigate the common biological mechanisms of COPD and AF and to search for important biomarkers through bioinformatic analysis of public RNA sequencing databases.MethodsFour datasets of COPD and AF were downloaded from the Gene Expression Omnibus (GEO) database. The overlapping genes common to both diseases were screened by WGCNA analysis, followed by protein-protein interaction network construction and functional enrichment analysis to elucidate the common mechanisms of COPD and AF. Machine learning algorithms were also used to identify key biomarkers. Co-expression analysis, “transcription factor (TF)-mRNA-microRNA (miRNA)” regulatory networks and drug prediction were performed for key biomarkers. Finally, immune cell infiltration analysis was performed to evaluate further the immune cell changes in the COPD dataset and the correlation between key biomarkers and immune cells.ResultsA total of 133 overlapping genes for COPD and AF were obtained, and the enrichment was mainly focused on pathways associated with the inflammatory immune response. A key biomarker, cyclin dependent kinase 8 (CDK8), was identified through screening by machine learning algorithms and validated in the validation dataset. Twenty potential drugs capable of targeting CDK8 were obtained. Immune cell infiltration analysis revealed the presence of multiple immune cell dysregulation in COPD. Correlation analysis showed that CDK8 expression was significantly associated with CD8+ T cells, resting dendritic cell, macrophage M2, and monocytes.ConclusionsThis study highlights the role of the inflammatory immune response in COPD combined with AF. The prominent link between CDK8 and the inflammatory immune response and its characteristic of not affecting the basal expression level of nuclear factor kappa B (NF-kB) make it a possible promising therapeutic target for COPD combined with AF.
Collapse
Affiliation(s)
- Ziyi Sun
- Guang’anmen Hospital, China Academy of Chinese Medical Sciences, Beijing, China
- Graduate School, Beijing University of Chinese Medicine, Beijing, China
| | - Jianguo Lin
- Guang’anmen Hospital, China Academy of Chinese Medical Sciences, Beijing, China
- Graduate School, China Academy of Chinese Medical Sciences, Beijing, China
| | - Tianya Zhang
- Graduate School, Hebei University of Chinese Medicine, Shijiazhuang, China
| | - Xiaoning Sun
- Guang’anmen Hospital, China Academy of Chinese Medical Sciences, Beijing, China
- Graduate School, China Academy of Chinese Medical Sciences, Beijing, China
| | - Tianlin Wang
- Guang’anmen Hospital, China Academy of Chinese Medical Sciences, Beijing, China
- Graduate School, Beijing University of Chinese Medicine, Beijing, China
| | - Jinlong Duan
- Guang’anmen Hospital, China Academy of Chinese Medical Sciences, Beijing, China
| | - Kuiwu Yao
- Guang’anmen Hospital, China Academy of Chinese Medical Sciences, Beijing, China
- Eye Hospital China Academy of Chinese Medical Sciences, China Academy of Chinese Medical Sciences, Beijing, China
- Correspondence: Kuiwu Yao
| |
Collapse
|
20
|
Patterson A, Elbasir A, Tian B, Auslander N. Computational Methods Summarizing Mutational Patterns in Cancer: Promise and Limitations for Clinical Applications. Cancers (Basel) 2023; 15:cancers15071958. [PMID: 37046619 PMCID: PMC10093138 DOI: 10.3390/cancers15071958] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Revised: 02/24/2023] [Accepted: 03/09/2023] [Indexed: 03/29/2023] Open
Abstract
Since the rise of next-generation sequencing technologies, the catalogue of mutations in cancer has been continuously expanding. To address the complexity of the cancer-genomic landscape and extract meaningful insights, numerous computational approaches have been developed over the last two decades. In this review, we survey the current leading computational methods to derive intricate mutational patterns in the context of clinical relevance. We begin with mutation signatures, explaining first how mutation signatures were developed and then examining the utility of studies using mutation signatures to correlate environmental effects on the cancer genome. Next, we examine current clinical research that employs mutation signatures and discuss the potential use cases and challenges of mutation signatures in clinical decision-making. We then examine computational studies developing tools to investigate complex patterns of mutations beyond the context of mutational signatures. We survey methods to identify cancer-driver genes, from single-driver studies to pathway and network analyses. In addition, we review methods inferring complex combinations of mutations for clinical tasks and using mutations integrated with multi-omics data to better predict cancer phenotypes. We examine the use of these tools for either discovery or prediction, including prediction of tumor origin, treatment outcomes, prognosis, and cancer typing. We further discuss the main limitations preventing widespread clinical integration of computational tools for the diagnosis and treatment of cancer. We end by proposing solutions to address these challenges using recent advances in machine learning.
Collapse
Affiliation(s)
- Andrew Patterson
- Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- The Wistar Institute, Philadelphia, PA 19104, USA
| | | | - Bin Tian
- The Wistar Institute, Philadelphia, PA 19104, USA
| | - Noam Auslander
- The Wistar Institute, Philadelphia, PA 19104, USA
- Department of Cancer Biology, University of Pennsylvania, Philadelphia, PA 19104, USA
- Correspondence:
| |
Collapse
|
21
|
Deyneko IV. Guidelines on the performance evaluation of motif recognition methods in bioinformatics. Front Genet 2023; 14:1135320. [PMID: 36824436 PMCID: PMC9941176 DOI: 10.3389/fgene.2023.1135320] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2022] [Accepted: 01/19/2023] [Indexed: 02/09/2023] Open
|
22
|
Li B, Li H, Zhang L, Ren T, Meng J. Expression analysis of human glioma susceptibility gene and P53 in human glioma and its clinical significance based on bioinformatics. Ann Transl Med 2023; 11:53. [PMID: 36819578 PMCID: PMC9929792 DOI: 10.21037/atm-22-5646] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/17/2022] [Accepted: 12/07/2022] [Indexed: 01/18/2023]
Abstract
Background The exact mechanism of glioblastoma multiforme (GBM) remains unclear. This study was to clarify the expression of P53 in glioma and its molecular mechanism, and to explore the possibility of P53 as a potential therapeutic target of glioma and its clinical application value, so as to provide a new theoretical basis for the treatment of glioma. Methods Firstly, a dataset was established to analyze the expression of P53 in different stages of glioma and its relationship with prognosis by using The Cancer Genome Atlas (TCGA) database, RNA-seq data, and survival data of glioma and normal control samples in gene expression profiling and interactive analysis (GEPIA). The genes co-expressed with P53 were screened out, their differential expression between glioma and normal control group was analyzed, and their functions were analyzed by enrichment analysis. The TGGA database was used for data verification and analysis. The correlation between P53 expression and clinicopathological parameters was analyzed. Kaplan-Meier survival analysis was used to analyze the relationship between P53 expression and overall survival (OS) and progression-free survival (PFS) of glioma patients, and Cox regression analysis was used to analyze the independent factors affecting OS and PFS of glioma patients. Results The results of TCGA data analysis were as follows: The expression level of P53 was different from that of different stages of glioma, namely, the expression level of P53 between grade II and grade III, grade III and grade IV, and grade II and grade IV were significantly different (P<0.05). The results of P53 gene-related survival analysis showed that KNL1 high expression and low expression were significantly different in OS, and the high expression group was associated with poor prognosis (P<0.05). Conclusions The P53 expression can be an effective biological indicator of poor prognosis of glioma.
Collapse
Affiliation(s)
- Baiyu Li
- Department of Neurology Care Ward, Gansu Provincial Hospital, Lanzhou, China
| | - Hang Li
- Department of Geriatrics, Chengdu Eighth People's Hospital (Geriatric Hospital of Chengdu Medical College), Chengdu, China
| | - Linghui Zhang
- Department of Internal Medicine, Department of Clinical Medicine, Shijiazhuang Medical College, Shijiazhuang, China
| | - Taowen Ren
- Department of Neurology Care Ward, Gansu Provincial Hospital, Lanzhou, China
| | - Jie Meng
- Department of Psychiatry, The Affiliated Hospital of Youjiang Medical University for Nationalities, Baise, China
| |
Collapse
|
23
|
Lu Z, Xu J, Cao B, Jin C. Screening and identification of susceptibility genes for osteosarcoma based on bioinformatics analysis. Ann Transl Med 2023; 11:87. [PMID: 36819543 PMCID: PMC9929789 DOI: 10.21037/atm-22-6369] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Accepted: 01/10/2023] [Indexed: 01/30/2023]
Abstract
Background Gene play an important role in malignant tumors. However, there is still insufficient research on genetic variations in osteosarcoma (OS) patients. Therefore, we aimed to analyze the gene expression profile of OS using bioinformatics and to explore the pathogenesis of OS at the molecular level. Methods The gene chip dataset of OS samples was downloaded from the Gene Expression Omnibus (GEO) database for screening differentially expressed genes (DEGs). The R language clusterProfiler software package was used to perform Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis for DEGs. The central node proteins of the protein interaction network were analyzed by the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) database, Cytoscape, and its plug-ins cytoHubba and NetworkAnalyzer to find the key genes. Results A total of 631 DEGs were obtained, including 362 upregulated genes and 269 downregulated genes. DEGs were mainly involved in the regulation of leukocyte chemotaxis and migration, vascular development, and other biological processes (BPs); mediation of receptor ligand activity, growth factor binding, growth factor activity, integrin binding, and other molecular functions (MFs); and were enriched in the extracellular matrix (ECM). Conclusions DEGs in the ECM and growth factors play a key role in the development of OS. The leukocyte transendothelial migration pathway and the PI3K-AKT pathway are closely related to OS, and the related molecular mechanism is worthy of further study.
Collapse
Affiliation(s)
- Zhengyu Lu
- Department of Orthopedics, Taizhou First People’s Hospital, Huangyan Hospital of Wenzhou Medical University, Taizhou, China
| | - Jin Xu
- Department of Orthopedics, Xiangyang Central Hospital, Affiliated Hospital of Hubei University of Arts and Science, Xiangyang, China
| | - Binhao Cao
- Department of Orthopedics, Taizhou First People’s Hospital, Huangyan Hospital of Wenzhou Medical University, Taizhou, China
| | - Chongqiang Jin
- Department of Orthopedics, Taizhou First People’s Hospital, Huangyan Hospital of Wenzhou Medical University, Taizhou, China
| |
Collapse
|
24
|
Feng H, Jin D, Li J, Li Y, Zou Q, Liu T. Matrix reconstruction with reliable neighbors for predicting potential MiRNA-disease associations. Brief Bioinform 2023; 24:6960615. [PMID: 36567252 DOI: 10.1093/bib/bbac571] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2022] [Revised: 10/16/2022] [Accepted: 11/23/2022] [Indexed: 12/27/2022] Open
Abstract
Numerous experimental studies have indicated that alteration and dysregulation in mircroRNAs (miRNAs) are associated with serious diseases. Identifying disease-related miRNAs is therefore an essential and challenging task in bioinformatics research. Computational methods are an efficient and economical alternative to conventional biomedical studies and can reveal underlying miRNA-disease associations for subsequent experimental confirmation with reasonable confidence. Despite the success of existing computational approaches, most of them only rely on the known miRNA-disease associations to predict associations without adding other data to increase the prediction accuracy, and they are affected by issues of data sparsity. In this paper, we present MRRN, a model that combines matrix reconstruction with node reliability to predict probable miRNA-disease associations. In MRRN, the most reliable neighbors of miRNA and disease are used to update the original miRNA-disease association matrix, which significantly reduces data sparsity. Unknown miRNA-disease associations are reconstructed by aggregating the most reliable first-order neighbors to increase prediction accuracy by representing the local and global structure of the heterogeneous network. Five-fold cross-validation of MRRN produced an area under the curve (AUC) of 0.9355 and area under the precision-recall curve (AUPR) of 0.2646, values that were greater than those produced by comparable models. Two different types of case studies using three diseases were conducted to demonstrate the accuracy of MRRN, and all top 30 predicted miRNAs were verified.
Collapse
Affiliation(s)
- Hailin Feng
- School of mathematics and computer science, Zhejiang A&F University, No.666 Wusu Street,Lin'an District, 311300, Hangzhou, China
| | - Dongdong Jin
- School of mathematics and computer science, Zhejiang A&F University, No.666 Wusu Street,Lin'an District, 311300, Hangzhou, China
| | - Jian Li
- School of mathematics and computer science, Zhejiang A&F University, No.666 Wusu Street,Lin'an District, 311300, Hangzhou, China
| | - Yane Li
- School of mathematics and computer science, Zhejiang A&F University, No.666 Wusu Street,Lin'an District, 311300, Hangzhou, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, No. 2006, Xiyuan Avenue, West District, high tech Zone, 611731, Chengdu, China
| | - Tongcun Liu
- School of mathematics and computer science, Zhejiang A&F University, No.666 Wusu Street,Lin'an District, 311300, Hangzhou, China
| |
Collapse
|
25
|
Tu DY, Cao J, Zhou J, Su BB, Wang SY, Jiang GQ, Jin SJ, Zhang C, Peng R, Bai DS. Identification of the mitophagy-related diagnostic biomarkers in hepatocellular carcinoma based on machine learning algorithm and construction of prognostic model. Front Oncol 2023; 13:1132559. [PMID: 36937391 PMCID: PMC10014545 DOI: 10.3389/fonc.2023.1132559] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2022] [Accepted: 02/15/2023] [Indexed: 03/05/2023] Open
Abstract
Background and aims As a result of increasing numbers of studies most recently, mitophagy plays a vital function in the genesis of cancer. However, research on the predictive potential and clinical importance of mitophagy-related genes (MRGs) in hepatocellular carcinoma (HCC) is currently lacking. This study aimed to uncover and analyze the mitophagy-related diagnostic biomarkers in HCC using machine learning (ML), as well as to investigate its biological role, immune infiltration, and clinical significance. Methods In our research, by using Least absolute shrinkage and selection operator (LASSO) regression and support vector machine- (SVM-) recursive feature elimination (RFE) algorithm, six mitophagy genes (ATG12, CSNK2B, MTERF3, TOMM20, TOMM22, and TOMM40) were identified from twenty-nine mitophagy genes, next, the algorithm of non-negative matrix factorization (NMF) was used to separate the HCC patients into cluster A and B based on the six mitophagy genes. And there was evidence from multi-analysis that cluster A and B were associated with tumor immune microenvironment (TIME), clinicopathological features, and prognosis. After then, based on the DEGs (differentially expressed genes) between cluster A and cluster B, the prognostic model (riskScore) of mitophagy was constructed, including ten mitophagy-related genes (G6PD, KIF20A, SLC1A5, TPX2, ANXA10, TRNP1, ADH4, CYP2C9, CFHR3, and SPP1). Results This study uncovered and analyzed the mitophagy-related diagnostic biomarkers in HCC using machine learning (ML), as well as to investigate its biological role, immune infiltration, and clinical significance. Based on the mitophagy-related diagnostic biomarkers, we constructed a prognostic model(riskScore). Furthermore, we discovered that the riskScore was associated with somatic mutation, TIME, chemotherapy efficacy, TACE and immunotherapy effectiveness in HCC patients. Conclusion Mitophagy may play an important role in the development of HCC, and further research on this issue is necessary. Furthermore, the riskScore performed well as a standalone prognostic marker in terms of accuracy and stability. It can provide some guidance for the diagnosis and treatment of HCC patients.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | - Chi Zhang
- *Correspondence: Dou-sheng Bai, ; Rui Peng, ; Chi Zhang,
| | - Rui Peng
- *Correspondence: Dou-sheng Bai, ; Rui Peng, ; Chi Zhang,
| | - Dou-sheng Bai
- *Correspondence: Dou-sheng Bai, ; Rui Peng, ; Chi Zhang,
| |
Collapse
|
26
|
Chen ZF, Wu LZ, Chen ZT, Su LJ, Fu CJ. The potential mechanisms of neuroblastoma in children based on bioinformatics big data. Transl Pediatr 2022; 11:1908-1919. [PMID: 36643678 PMCID: PMC9834953 DOI: 10.21037/tp-22-504] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Accepted: 11/29/2022] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND In recent years, miRNAs have become a research hotspot, which is related to the occurrence and development of a variety of malignant tumors, but there are few studies in neuroblastoma. In this study, the differentially expressed microRNAs (miRNAs) in neuroblastoma were identified and analyzed using bioinformatics, and their biological functions and related signaling pathways were examined. METHODS The neuroblastoma miRNA chip GSE121513 was obtained from the Gene Expression Omnibus (GEO) database and the data of 95 neuroblastoma samples and normal fetal adrenal neuroblastoma samples were analyzed to screen the differential miRNAs. The target genes of the differentially expressed miRNAs were predicted using |log fold change (FC)| ≥4. Kyoto Encyclopedia of Genes and Genomes (KEGG) and Gene Ontology (GO) analyses were performed to construct a protein-protein interaction network and identify the core target genes. RESULTS A total of 91 differentially expressed miRNAs were identified (P<0.05, |logFC| ≥1), including 52 upregulated and 39 downregulated miRNAs. The target genes of the differential miRNAs (P<0.05, |logFC| ≥4) were pretested, and 602 target genes were obtained. Functional analysis showed that these genes were mainly located in the extracellular matrix region of proteins, and were involved in the negative regulation of cytoplasmic translation, mRNA 3'-untranslated region (UTR) binding, and binding to nucleic acid to inhibit the activity of translation factors. They were also involved in RNA degradation, adhesion pathways, and the phosphatidylinositol-3-kinase (PI3K)-Akt signaling pathway. Ten key target genes were identified via protein interaction network screening. CONCLUSIONS The differential miRNAs may be related to the occurrence of neuroblastoma were screened.
Collapse
Affiliation(s)
- Ze-Fu Chen
- Department of Pediatrics, Hainan General Hospital, Haikou, China
| | - Li-Zhen Wu
- Department of Operating Theater, Hainan General Hospital, Haikou, China
| | - Ze-Ting Chen
- Department of Pediatric Surgery, Hainan General Hospital, Haikou, China
| | - Liang-Ju Su
- Department of Pediatric Surgery, Hainan General Hospital, Haikou, China
| | - Ce-Jun Fu
- Department of Pediatric Surgery, Hainan General Hospital, Haikou, China
| |
Collapse
|
27
|
Moris D, Henao R, Hensman H, Stempora L, Chasse S, Schobel S, Dente CJ, Kirk AD, Elster E. Multidimensional machine learning models predicting outcomes after trauma. Surgery 2022; 172:1851-1859. [PMID: 36116976 DOI: 10.1016/j.surg.2022.08.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Revised: 08/01/2022] [Accepted: 08/04/2022] [Indexed: 01/07/2023]
Abstract
BACKGROUND An emerging body of literature supports the role of individualized prognostic tools to guide the management of patients after trauma. The aim of this study was to develop advanced modeling tools from multidimensional data sources, including immunological analytes and clinical and administrative data, to predict outcomes in trauma patients. METHODS This was a prospective study of trauma patients at Level 1 centers from 2015 to 2019. Clinical, flow cytometry, and serum cytokine data were collected within 48 hours of admission. Sparse logistic regression models were developed, jointly selecting predictors and estimating the risk of ventilator-associated pneumonia, acute kidney injury, complicated disposition (death, rehabilitation, or nursing facility), and return to the operating room. Model parameters (regularization controlling model sparsity) and performance estimation were obtained via nested leave-one-out cross-validation. RESULTS A total of 179 patients were included. The incidences of ventilator-associated pneumonia, acute kidney injury, complicated disposition, and return to the operating room were 17.7%, 28.8%, 22.5%, and 12.3%, respectively. Regarding extensive resource use, 30.7% of patients had prolonged intensive care unit stay, 73.2% had prolonged length of stay, and 23.5% had need for prolonged ventilatory support. The models were developed and cross-validated for ventilator-associated pneumonia, acute kidney injury, complicated dispositions, and return to the operating room, yielding predictive areas under the curve from 0.70 to 0.91. Each model derived its optimal predictive value by combining clinical, administrative, and immunological analyte data. CONCLUSION Clinical, immunological, and administrative data can be combined to predict post-traumatic outcomes and resource use. Multidimensional machine learning modeling can identify trauma patients with complicated clinical trajectories and high resource needs.
Collapse
Affiliation(s)
| | | | - Hannah Hensman
- DecisionQ, Arlington, VA; Surgical Critical Care Initiative, Department of Surgery, Uniformed Services University of the Health Sciences; Bethesda, MD
| | - Linda Stempora
- Medical Center, Duke University Durham, NC; Surgical Critical Care Initiative, Department of Surgery, Uniformed Services University of the Health Sciences; Bethesda, MD
| | - Scott Chasse
- Medical Center, Duke University Durham, NC; Surgical Critical Care Initiative, Department of Surgery, Uniformed Services University of the Health Sciences; Bethesda, MD
| | - Seth Schobel
- Surgical Critical Care Initiative, Department of Surgery, Uniformed Services University of the Health Sciences; Bethesda, MD; Henry M. Jackson Foundation for the Advancement of Military Medicine, Inc, Bethesda, MD
| | | | - Allan D Kirk
- Medical Center, Duke University Durham, NC; Surgical Critical Care Initiative, Department of Surgery, Uniformed Services University of the Health Sciences; Bethesda, MD
| | - Eric Elster
- Surgical Critical Care Initiative, Department of Surgery, Uniformed Services University of the Health Sciences; Bethesda, MD; Walter Reed National Military Medical Center, Bethesda, MD
| |
Collapse
|
28
|
Ge S, Xu C, Li Y, Zhang Y, Li N, Wang F, Ding L, Niu J, Shi Z. Identification of the Diagnostic Biomarker VIPR1 in Hepatocellular Carcinoma Based on Machine Learning Algorithm. Journal of Oncology 2022; 2022:1-13. [PMID: 36157238 PMCID: PMC9499748 DOI: 10.1155/2022/2469592] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/31/2022] [Revised: 08/21/2022] [Accepted: 08/23/2022] [Indexed: 12/24/2022]
Abstract
The purpose of this study was to identify the potential diagnostic biomarkers in hepatocellular carcinoma (HCC) by machine learning (ML) and to explore the significance of immune cell infiltration in HCC. From GEO datasets, the microarray datasets of HCC patients were obtained and downloaded. Differentially expressed genes (DEGs) were screened from five datasets of GSE57957, GSE84402, GSE112790, GSE113996, and GSE121248, totalling 125 normal liver tissues and 326 HCC tissues. In order to find the diagnostic indicators of HCC, the LASSO regression and the SVM-RFE algorithms were utilized. The prognostic value of VIPR1 was analyzed. Finally, the difference of immune cell infiltration between HCC tissues and normal liver tissues was evaluated by CIBERSORT algorithm. In this study, a total of 232 DEGs were identified in 125 normal liver tissues and 326 HCC tissues. 11 diagnostic markers were identified by LASSO regression and SVM-RFE algorithms. FCN2, ECM1, VIRP1, IGFALS, and ASPG genes with AUC>0.85 were regarded as candidate biomarkers with high diagnostic value, and the above results were verified in GSE36376. Survival analyses showed that VIPR1 and IGFALS were significantly correlated with the OS, while ASPG, ECM1, and FCN2 had no statistical significance with the OS. Multivariate assays indicated that VIPR1 gene could be used as an independent prognostic factor for HCC, while FCN2, ECM1, IGFALS, and ASPG could not be used as independent prognostic factors for HCC. Immune cell infiltration analyses showed that the expression of VIPR1 in HCC was positively correlated with the levels of several immune cells. Overall, VIPR1 gene can be used as a diagnostic feature marker of HCC and may be a potential target for the diagnosis and treatment of liver cancer in the future.
Collapse
|
29
|
Abstract
In recent years, deep learning has emerged as a highly active research field, achieving great success in various machine learning areas, including image processing, speech recognition, and natural language processing, and now rapidly becoming a dominant tool in biomedicine [...].
Collapse
Affiliation(s)
- Mingon Kang
- Department of Computer Science, University of Nevada, Las Vegas, NV 89154, USA;
| | - Jung Hun Oh
- Department of Medical Physics, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
- Correspondence:
| |
Collapse
|
30
|
Abstract
Next-generation sequencing has been essential to the global response to the COVID-19 pandemic. As of January 2022, nearly 7 million severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) sequences are available to researchers in public databases. Sequence databases are an abundant resource from which to extract biologically relevant and clinically actionable information. As the pandemic has gone on, SARS-CoV-2 has rapidly evolved, involving complex genomic changes that challenge current approaches to classifying SARS-CoV-2 variants. Deep sequence learning could be a potentially powerful way to build complex sequence-to-phenotype models. Unfortunately, while they can be predictive, deep learning typically produces "black box" models that cannot directly provide biological and clinical insight. Researchers should therefore consider implementing emerging methods for visualizing and interpreting deep sequence models. Finally, researchers should address important data limitations, including (i) global sequencing disparities, (ii) insufficient sequence metadata, and (iii) screening artifacts due to poor sequence quality control.
Collapse
Affiliation(s)
- Bahrad A. Sokhansanj
- Drexel University, Ecological and Evolutionary Signal-Processing and Informatics Laboratory, Department of Electrical & Computer Engineering, College of Engineering, Philadelphia, Pennsylvania, USA
| | - Gail L. Rosen
- Drexel University, Ecological and Evolutionary Signal-Processing and Informatics Laboratory, Department of Electrical & Computer Engineering, College of Engineering, Philadelphia, Pennsylvania, USA
| |
Collapse
|
31
|
Kieft K, Anantharaman K. Virus genomics: what is being overlooked? Curr Opin Virol 2022; 53:101200. [DOI: 10.1016/j.coviro.2022.101200] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2021] [Revised: 12/21/2021] [Accepted: 01/03/2022] [Indexed: 01/05/2023]
|
32
|
Khalili E, Ramazi S, Ghanati F, Kouchaki S. Predicting protein phosphorylation sites in soybean using interpretable deep tabular learning network. Brief Bioinform 2022; 23:bbac015. [PMID: 35152280 DOI: 10.1093/bib/bbac015] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2021] [Revised: 12/17/2021] [Accepted: 01/12/2022] [Indexed: 12/17/2023] Open
Abstract
Phosphorylation of proteins is one of the most significant post-translational modifications (PTMs) and plays a crucial role in plant functionality due to its impact on signaling, gene expression, enzyme kinetics, protein stability and interactions. Accurate prediction of plant phosphorylation sites (p-sites) is vital as abnormal regulation of phosphorylation usually leads to plant diseases. However, current experimental methods for PTM prediction suffers from high-computational cost and are error-prone. The present study develops machine learning-based prediction techniques, including a high-performance interpretable deep tabular learning network (TabNet) to improve the prediction of protein p-sites in soybean. Moreover, we use a hybrid feature set of sequential-based features, physicochemical properties and position-specific scoring matrices to predict serine (Ser/S), threonine (Thr/T) and tyrosine (Tyr/Y) p-sites in soybean for the first time. The experimentally verified p-sites data of soybean proteins are collected from the eukaryotic phosphorylation sites database and database post-translational modification. We then remove the redundant set of positive and negative samples by dropping protein sequences with >40% similarity. It is found that the developed techniques perform >70% in terms of accuracy. The results demonstrate that the TabNet model is the best performing classifier using hybrid features and with window size of 13, resulted in 78.96 and 77.24% sensitivity and specificity, respectively. The results indicate that the TabNet method has advantages in terms of high-performance and interpretability. The proposed technique can automatically analyze the data without any measurement errors and any human intervention. Furthermore, it can be used to predict putative protein p-sites in plants effectively. The collected dataset and source code are publicly deposited at https://github.com/Elham-khalili/Soybean-P-sites-Prediction.
Collapse
Affiliation(s)
- Elham Khalili
- Department of Plant Science, Faculty of Science, Tarbiat Modarres University, Tehran, Iran
| | - Shahin Ramazi
- Department of Biophysics, Faculty of Biological Science, Tarbiat Modares University, Tehran, Iran
| | - Faezeh Ghanati
- Department of Plant Science, Faculty of Science, Tarbiat Modarres University, Tehran, Iran
| | - Samaneh Kouchaki
- Department of Electrical and Electronic Engineering, .Faculty of Engineering and Physical Sciences, Centre for Vision, Speech, and Signal Processing, University of Surrey, Guildford, UK
| |
Collapse
|
33
|
Liu Z, Han N, Su T, Ji Y, Bao H, Zhou S, Luo S, Wang H, Liu J, Wang HJ. Interpretable machine learning to identify important predictors of birth weight: A prospective cohort study. Front Pediatr 2022; 10:899954. [PMID: 36440327 PMCID: PMC9691849 DOI: 10.3389/fped.2022.899954] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/29/2022] [Accepted: 10/24/2022] [Indexed: 11/29/2022] Open
Abstract
BACKGROUND Predicting birth weight and identifying its risk factors are clinically important. This study aims to use interpretable machine learning to predict birth weight and identity important predictors. METHODS This prospective cohort study was conducted in Tongzhou Maternal and Child Health Care Hospital of Beijing, China, recruiting pregnant women between June 2018 and February 2019. We used 24 features to predict infant birth weight, including gestational age, mother's age, parity, history of macrosomia delivery, pre-pregnancy body mass index (BMI), height, father's BMI, lifestyle (diet, physical activity, smoking), and biomarker (fasting glucose and lipids) features. Study outcome was birth weight of infant. We used 8 supervised learning models including 4 individual [linear regression, ridge regression, lasso regression, support vector machines regression (SVR)], and 4 ensemble estimators (random forest, AdaBoost, gradient boosted trees, and voting ensemble for regression) to predict birth weight. Model accuracy was measured by root mean squared error (RMSE) of 10-fold cross validation on the training set and RMSE of prediction on the test set. We used permutation importance algorithm to understand the prediction from the models and what affected them. RESULT This study included 4,754 mother-child dyads. RMSEs were lower in voting ensemble for regression, linear regression, and SVR than random forest, AdaBoost, and gradient boosted tree. The 5 most important predictors for infant birth weight were gestational age, fetal sex, preterm birth, mother's height, and pre-pregnancy BMI. After adding ultrasound-measured indicators of fetal growth into predictors, mother's height and pre-pregnancy BMI remained the most important predictors in predicting the outcome. CONCLUSION Mother's height and pre-pregnancy BMI were identified as important predictors for infant birth weight. Interpretable machine learning is a promising tool in the prediction of birth weight.
Collapse
Affiliation(s)
- Zheng Liu
- Department of Maternal and Child Health, School of Public Health, Peking University, National Health Commission Key Laboratory of Reproductive Health, Beijing, China
| | - Na Han
- Tongzhou Maternal and Child Health Care Hospital of Beijing, Beijing, China
| | - Tao Su
- Tongzhou Maternal and Child Health Care Hospital of Beijing, Beijing, China
| | - Yuelong Ji
- Department of Maternal and Child Health, School of Public Health, Peking University, National Health Commission Key Laboratory of Reproductive Health, Beijing, China
| | - Heling Bao
- Department of Maternal and Child Health, School of Public Health, Peking University, National Health Commission Key Laboratory of Reproductive Health, Beijing, China
| | - Shuang Zhou
- Department of Maternal and Child Health, School of Public Health, Peking University, National Health Commission Key Laboratory of Reproductive Health, Beijing, China
| | - Shusheng Luo
- Department of Maternal and Child Health, School of Public Health, Peking University, National Health Commission Key Laboratory of Reproductive Health, Beijing, China
| | - Hui Wang
- Department of Maternal and Child Health, School of Public Health, Peking University, National Health Commission Key Laboratory of Reproductive Health, Beijing, China
| | - Jue Liu
- Department of Epidemiology and Biostatistics, School of Public Health, Peking University, Beijing, China
| | - Hai-Jun Wang
- Department of Maternal and Child Health, School of Public Health, Peking University, National Health Commission Key Laboratory of Reproductive Health, Beijing, China
| |
Collapse
|
34
|
Hammad A, Elshaer M, Tang X. Identification of potential biomarkers with colorectal cancer based on bioinformatics analysis and machine learning. Math Biosci Eng 2021; 18:8997-9015. [PMID: 34814332 DOI: 10.3934/mbe.2021443] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Colorectal cancer (CRC) is one of the most common malignancies worldwide. Biomarker discovery is critical to improve CRC diagnosis, however, machine learning offers a new platform to study the etiology of CRC for this purpose. Therefore, the current study aimed to perform an integrated bioinformatics and machine learning analyses to explore novel biomarkers for CRC prognosis. In this study, we acquired gene expression microarray data from Gene Expression Omnibus (GEO) database. The microarray expressions GSE103512 dataset was downloaded and integrated. Subsequently, differentially expressed genes (DEGs) were identified and functionally analyzed via Gene Ontology (GO) and Kyoto Enrichment of Genes and Genomes (KEGG). Furthermore, protein protein interaction (PPI) network analysis was conducted using the STRING database and Cytoscape software to identify hub genes; however, the hub genes were subjected to Support Vector Machine (SVM), Receiver operating characteristic curve (ROC) and survival analyses to explore their diagnostic values. Meanwhile, TCGA transcriptomics data in Gene Expression Profiling Interactive Analysis (GEPIA) database and the pathology data presented by in the human protein atlas (HPA) database were used to verify our transcriptomic analyses. A total of 105 DEGs were identified in this study. Functional enrichment analysis showed that these genes were significantly enriched in biological processes related to cancer progression. Thereafter, PPI network explored a total of 10 significant hub genes. The ROC curve was used to predict the potential application of biomarkers in CRC diagnosis, with an area under ROC curve (AUC) of these genes exceeding 0.92 suggesting that this risk classifier can discriminate between CRC patients and normal controls. Moreover, the prognostic values of these hub genes were confirmed by survival analyses using different CRC patient cohorts. Our results demonstrated that these 10 differentially expressed hub genes could be used as potential biomarkers for CRC diagnosis.
Collapse
Affiliation(s)
- Ahmed Hammad
- Department of Biochemistry and Department of Thoracic Surgery of the First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou 310003, China
- Radiation Biology Department, National Center for Radiation Research and Technology, Egyptian Atomic Energy Authority, Cairo 13759, Egypt
| | - Mohamed Elshaer
- Department of Biochemistry and Department of Thoracic Surgery of the First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou 310003, China
- Labeled Compounds Department, Hot Labs Center, Egyptian Atomic Energy Authority, Cairo 13759, Egypt
| | - Xiuwen Tang
- Department of Biochemistry and Department of Thoracic Surgery of the First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou 310003, China
| |
Collapse
|
35
|
Lecca P. Machine Learning for Causal Inference in Biological Networks: Perspectives of This Challenge. Front Bioinform 2021; 1:746712. [PMID: 36303798 PMCID: PMC9581010 DOI: 10.3389/fbinf.2021.746712] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2021] [Accepted: 09/08/2021] [Indexed: 11/13/2022] Open
Abstract
Most machine learning-based methods predict outcomes rather than understanding causality. Machine learning methods have been proved to be efficient in finding correlations in data, but unskilful to determine causation. This issue severely limits the applicability of machine learning methods to infer the causal relationships between the entities of a biological network, and more in general of any dynamical system, such as medical intervention strategies and clinical outcomes system, that is representable as a network. From the perspective of those who want to use the results of network inference not only to understand the mechanisms underlying the dynamics, but also to understand how the network reacts to external stimuli (e. g. environmental factors, therapeutic treatments), tools that can understand the causal relationships between data are highly demanded. Given the increasing popularity of machine learning techniques in computational biology and the recent literature proposing the use of machine learning techniques for the inference of biological networks, we would like to present the challenges that mathematics and computer science research faces in generalising machine learning to an approach capable of understanding causal relationships, and the prospects that achieving this will open up for the medical application domains of systems biology, the main paradigm of which is precisely network biology at any physical scale.
Collapse
|