1
|
Jiang D, Wu X, Deng Y, Yang X, Wang Z, Tang Y, He L, He X. Single-Cell Profiling Reveals Conserved Differentiation and Partial EMT Programs Orchestrating Ecosystem-Level Antagonisms in Head and Neck Cancer. J Cell Mol Med 2025; 29:e70575. [PMID: 40318012 PMCID: PMC12049153 DOI: 10.1111/jcmm.70575] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2024] [Revised: 03/27/2025] [Accepted: 04/19/2025] [Indexed: 05/07/2025] Open
Abstract
Head and neck squamous cell carcinoma (HNSC) exhibits profound intratumoral heterogeneity, driven by dynamic interactions between malignant cells and the tumour microenvironment (TME). Using consensus non-negative matrix factorisation (cNMF) on multi-site HNSC single-cell transcriptomes, we resolving conserved meta-programs define cellular ecosystems. Six major epithelial programmes emerged, including a differentiation-associated programme (Epi_Diff) correlated with SPDEF activity and favourable patient prognosis, and an invasive programme (Epi_pEMT) potentially controlled by TEAD4-mediated ECM remodelling, exhibiting partial EMT markers (VIM, TGFB1). Compartment-specific crosstalk analysis revealed Epi_pEMT cells may coordinate with mCAF1 fibroblasts and TAM(SPP1) through COL1A1-CD44 and SPP1-CD44 signalling, suggesting potential formation of a pro-invasive niche. Conversely, Epi_Diff cells may interact with NK/T cells through CEACAM5-CD8A and CCL5-ACKR2, and may contribute to inhibit immune infiltration. Multi-compartment correlation analysis revealed three ecosystem-level patterns: (1) Inverse association between Epi_Diff and Epi_pEMT (Spearman R = -0.43); (2) Negative correlation between mCAF1 abundance and cCAF frequency (R = -0.48); (3) TAM(SPP1) dominance inversely correlating with both TAM(C1Q) (R = -0.43) and NK/T infiltration (R = -0.36). These axes suggest a potential hierarchical ecology framework where lineage-specific polarisation and inter-compartment synergies may collectively govern disease progression.
Collapse
Affiliation(s)
- Donghui Jiang
- Department of Otolaryngology & Head and Neck SurgeryFirst Affiliated Hospital of Kunming Medical UniversityKunmingYunnanChina
| | - Xiaoguang Wu
- Department of Otolaryngology & Head and Neck SurgeryFirst Affiliated Hospital of Kunming Medical UniversityKunmingYunnanChina
| | - Yuanyuan Deng
- Department of DermatologyFirst Affiliated Hospital of Kunming Medical UniversityKunmingYunnanChina
| | - Xi Yang
- Department of Otolaryngology & Head and Neck SurgeryFirst Affiliated Hospital of Kunming Medical UniversityKunmingYunnanChina
| | - Zhiqiang Wang
- Department of Radiation OncologyFirst Affiliated Hospital of Kunming Medical UniversityKunmingYunnanChina
| | - Yong Tang
- Department of Otolaryngology & Head and Neck SurgeryFirst Affiliated Hospital of Kunming Medical UniversityKunmingYunnanChina
| | - Li He
- Department of DermatologyFirst Affiliated Hospital of Kunming Medical UniversityKunmingYunnanChina
| | - Xiaoguang He
- Department of Otolaryngology & Head and Neck SurgeryFirst Affiliated Hospital of Kunming Medical UniversityKunmingYunnanChina
| |
Collapse
|
2
|
Li Y, Wang J, Miao Y, Dunk MM, Maioli S, Fang Z, Zhang Q, Xu W. Association of Plasma Fatty Acid Profile With Trajectory of Multimorbidity and Mortality: A Community-Based Longitudinal Study. J Gerontol A Biol Sci Med Sci 2025; 80:glaf031. [PMID: 39954290 DOI: 10.1093/gerona/glaf031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2024] [Indexed: 02/17/2025] Open
Abstract
BACKGROUND Plasma fatty acids have been linked to various chronic diseases and mortality, but the extent to which fatty acids are associated with the trajectory of multimorbidity remains unclear. We investigated the association of fatty acid profile with multimorbidity trajectories and event-free survival. METHODS Within the UK Biobank, 138,685 chronic disease-free participants were followed for up to 16 years. Seventeen plasma fatty acids were measured by nuclear magnetic resonance. A comprehensive healthy fatty acid score (HFAS) was constructed using LASSO regression. Incidence of chronic diseases and death were ascertained through linkages to medical and death records. Event-free survival was defined as survival without chronic diseases or death. Data were analyzed using a linear mixed-effects model, Cox regression, and Laplace regression. RESULTS High HFAS was associated with lower risk of chronic diseases/death (hazard ratio [HR]: 0.907, 95% confidence interval [CI]: 0.888-0.925) and prolonged event-free survival time by 0.636 (95% CI: 0.500-0.774) years compared with low HFAS. High HFAS was also associated with a slower accumulation trajectory of multimorbidity (β: -0.042, 95% CI: -0.045 to -0.038). There was a significant multiplicative interaction between moderate-to-high HFAS and healthy lifestyle on chronic disease/death (p for interaction = .002) and multimorbidity accumulation trajectories (p for interaction < .001). CONCLUSIONS A healthier plasma fatty acid metabolic profile is associated with a slower accumulation of multimorbidity and prolonged event-free survival time. A healthy lifestyle may strengthen the protective association of HFAS with the risk of chronic diseases/death and the accumulation trajectory of multimorbidity.
Collapse
Affiliation(s)
- Yang Li
- Department of Toxicology and Health Inspection and Quarantine, School of Public Health, Tianjin Medical University, Tianjin, China
| | - Jiao Wang
- National Clinical Research Center for Geriatrics, West China Hospital, Sichuan University, Chengdu, China
- Department of Neurobiology, Care Science and Society, Karolinska Institutet, Stockholm, Sweden
| | - Yuyang Miao
- Department of Geriatrics, Tianjin Medical University General Hospital; Tianjin Key Laboratory of Elderly Health; Tianjin Geriatrics Institute; Tianjin, China
| | - Michelle M Dunk
- Department of Neurobiology, Care Science and Society, Karolinska Institutet, Stockholm, Sweden
| | - Silvia Maioli
- Department of Neurobiology, Care Sciences and Society, Division of Neurogeriatrics, Center for Alzheimer Research, Karolinska Institutet, Stockholm, Sweden
| | - Zhongze Fang
- Department of Toxicology and Health Inspection and Quarantine, School of Public Health, Tianjin Medical University, Tianjin, China
| | - Qiang Zhang
- Department of Geriatrics, Tianjin Medical University General Hospital; Tianjin Key Laboratory of Elderly Health; Tianjin Geriatrics Institute; Tianjin, China
| | - Weili Xu
- Department of Neurobiology, Care Science and Society, Karolinska Institutet, Stockholm, Sweden
- Department of Geriatrics, Tianjin Medical University General Hospital; Tianjin Key Laboratory of Elderly Health; Tianjin Geriatrics Institute; Tianjin, China
| |
Collapse
|
3
|
Wang D, Gui S, Pu J, Zhong X, Yan L, Li Z, Tao X, Yang D, Zhou H, Qiao R, Zhang H, Cheng X, Ren Y, Chen W, Chen X, Tao W, Chen Y, Chen X, Liu Y, Xie P. PsycGM: a comprehensive database for associations between gut microbiota and psychiatric disorders. Mol Psychiatry 2025:10.1038/s41380-025-03000-5. [PMID: 40185904 DOI: 10.1038/s41380-025-03000-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/01/2024] [Revised: 03/03/2025] [Accepted: 03/26/2025] [Indexed: 04/07/2025]
Abstract
Psychiatric disorders pose substantial global burdens on public health, yet therapeutic options remain limited. Recently, gut microbiota is in the spotlight of new research on psychiatric disorders, as emerging discoveries have highlighted the importance of gut microbiome in the regulation of central nervous system via mediating the gut-brain-axis bidirectional communication. While metagenomics studies have accumulated for psychiatric disorders, few systematic efforts were dedicated to integrating these high-throughput data across diverse phenotypes, interventions, geographical regions, and biological species. To present a panoramic view of global data and provide a comprehensive resource for investigating the gut microbiota dysbiosis in psychiatric disorders, we developed the PsycGM, a manually curated and well-annotated database that provides the literature-supported associations between gut microbiota and psychiatric disorders or intervention measures. In total, PsycGM incorporated 559 studies from 31 countries worldwide, encompassing research involving humans, rats, mice, and non-human primates. PsycGM documented 8907 curated associations between 1514 gut microbial taxa and 11 psychiatric disorders, as well as 4050 associations between 869 taxa and 232 microbiota-based and non-microbiota-based interventions. Moreover, PsycGM provided a user-friendly web interface with comprehensive information, enabling browsing, retrieving and downloading of all entries. In the application of PsycGM, we panoramically depicted the intestinal microecological imbalance in depression. Additionally, we identified 9 microbial taxa consistently altered in patients with depression, with the most common dysregulations observed for Parabacteroides, Alistipes, and Faecalibacterium; in animal models of depression, consistent changes were observed in 21 microbial taxa, most frequently reported as Helicobacter, Lactobacillus, Roseburia, and the ratio of Firmicutes/Bacteroidetes. PsycGM is a comprehensive resource for future investigations on the role of gut microbiota in mental and brain health, and for therapeutic target innovations based on modifications of gut microbiota. PsycGM is freely accessed at http://psycgmomics.info .
Collapse
Affiliation(s)
- Dongfang Wang
- Department of Neurology, NHC Key Laboratory of Diagnosis and Treatment on Brain Functional Diseases, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
- Jinfeng Laboratory, Chongqing, 401329, China
- Chongqing Institute for Brain and Intelligence, Chongqing, 400064, China
| | - Siwen Gui
- Department of Neurology, NHC Key Laboratory of Diagnosis and Treatment on Brain Functional Diseases, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
- Jinfeng Laboratory, Chongqing, 401329, China
- Chongqing Institute for Brain and Intelligence, Chongqing, 400064, China
| | - Juncai Pu
- Department of Neurology, NHC Key Laboratory of Diagnosis and Treatment on Brain Functional Diseases, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
| | - Xiaogang Zhong
- Department of Neurology, NHC Key Laboratory of Diagnosis and Treatment on Brain Functional Diseases, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
- Jinfeng Laboratory, Chongqing, 401329, China
- Chongqing Institute for Brain and Intelligence, Chongqing, 400064, China
| | - Li Yan
- School of Medical Information, Chongqing Medical University, Chongqing, 400042, China
| | - Zhuocan Li
- Department of Neurology, NHC Key Laboratory of Diagnosis and Treatment on Brain Functional Diseases, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
| | - Xiangkun Tao
- Department of Neurology, NHC Key Laboratory of Diagnosis and Treatment on Brain Functional Diseases, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
| | - Dan Yang
- Department of Neurology, NHC Key Laboratory of Diagnosis and Treatment on Brain Functional Diseases, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
| | - Haipeng Zhou
- Department of Neurology, NHC Key Laboratory of Diagnosis and Treatment on Brain Functional Diseases, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
| | - Renjie Qiao
- Department of Neurology, NHC Key Laboratory of Diagnosis and Treatment on Brain Functional Diseases, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
| | - Hanping Zhang
- Department of Neurology, NHC Key Laboratory of Diagnosis and Treatment on Brain Functional Diseases, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
| | - Xiangyu Cheng
- Department of Neurology, NHC Key Laboratory of Diagnosis and Treatment on Brain Functional Diseases, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
| | - Yi Ren
- Department of Neurology, NHC Key Laboratory of Diagnosis and Treatment on Brain Functional Diseases, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
| | - Weiyi Chen
- Department of Neurology, NHC Key Laboratory of Diagnosis and Treatment on Brain Functional Diseases, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
| | - Xiaopeng Chen
- Department of Neurology, NHC Key Laboratory of Diagnosis and Treatment on Brain Functional Diseases, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
| | - Wei Tao
- Department of Neurology, NHC Key Laboratory of Diagnosis and Treatment on Brain Functional Diseases, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
| | - Yue Chen
- Department of Neurology, NHC Key Laboratory of Diagnosis and Treatment on Brain Functional Diseases, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
| | - Xiang Chen
- Department of Neurology, NHC Key Laboratory of Diagnosis and Treatment on Brain Functional Diseases, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
| | - Yiyun Liu
- Department of Neurology, NHC Key Laboratory of Diagnosis and Treatment on Brain Functional Diseases, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China.
- Jinfeng Laboratory, Chongqing, 401329, China.
- Chongqing Institute for Brain and Intelligence, Chongqing, 400064, China.
| | - Peng Xie
- Department of Neurology, NHC Key Laboratory of Diagnosis and Treatment on Brain Functional Diseases, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China.
- Jinfeng Laboratory, Chongqing, 401329, China.
- Chongqing Institute for Brain and Intelligence, Chongqing, 400064, China.
| |
Collapse
|
4
|
Xia Y, Zhang Y, Liu D, Zhu YH, Wang Z, Song J, Yu DJ. BLAM6A-Merge: Leveraging Attention Mechanisms and Feature Fusion Strategies to Improve the Identification of RNA N6-Methyladenosine Sites. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:1803-1815. [PMID: 38913512 DOI: 10.1109/tcbb.2024.3418490] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/26/2024]
Abstract
RNA N6-methyladenosine is a prevalent and abundant type of RNA modification that exerts significant influence on diverse biological processes. To date, numerous computational approaches have been developed for predicting methylation, with most of them ignoring the correlations of different encoding strategies and failing to explore the adaptability of various attention mechanisms for methylation identification. To solve the above issues, we proposed an innovative framework for predicting RNA m6A modification site, termed BLAM6A-Merge. Specifically, it utilized a multimodal feature fusion strategy to combine the classification results of four features and Blastn tool. Apart from this, different attention mechanisms were employed for extracting higher-level features on specific features after the screening process. Extensive experiments on 12 benchmarking datasets demonstrated that BLAM6A-Merge achieved superior performance (average AUC: 0.849 for the full transcript mode and 0.784 for the mature mRNA mode). Notably, the Blastn tool was employed for the first time in the identification of methylation sites.
Collapse
|
5
|
Pu J, Yu Y, Liu Y, Wang D, Gui S, Zhong X, Chen W, Chen X, Chen Y, Chen X, Qiao R, Jiang Y, Zhang H, Fan L, Ren Y, Chen X, Wang H, Xie P. ProMENDA: an updated resource for proteomic and metabolomic characterization in depression. Transl Psychiatry 2024; 14:229. [PMID: 38816410 PMCID: PMC11139925 DOI: 10.1038/s41398-024-02948-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/24/2023] [Revised: 05/15/2024] [Accepted: 05/17/2024] [Indexed: 06/01/2024] Open
Abstract
Depression is a prevalent mental disorder with a complex biological mechanism. Following the rapid development of systems biology technology, a growing number of studies have applied proteomics and metabolomics to explore the molecular profiles of depression. However, a standardized resource facilitating the identification and annotation of the available knowledge from these scattered studies associated with depression is currently lacking. This study presents ProMENDA, an upgraded resource that provides a platform for manual annotation of candidate proteins and metabolites linked to depression. Following the establishment of the protein dataset and the update of the metabolite dataset, the ProMENDA database was developed as a major extension of its initial release. A multi-faceted annotation scheme was employed to provide comprehensive knowledge of the molecules and studies. A new web interface was also developed to improve the user experience. The ProMENDA database now contains 43,366 molecular entries, comprising 20,847 protein entries and 22,519 metabolite entries, which were manually curated from 1370 human, rat, mouse, and non-human primate studies. This represents a significant increase (more than 7-fold) in molecular entries compared to the initial release. To demonstrate the usage of ProMENDA, a case study identifying consistently reported proteins and metabolites in the brains of animal models of depression was presented. Overall, ProMENDA is a comprehensive resource that offers a panoramic view of proteomic and metabolomic knowledge in depression. ProMENDA is freely available at https://menda.cqmu.edu.cn .
Collapse
Affiliation(s)
- Juncai Pu
- Department of Neurology, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
- NHC Key Laboratory of Diagnosis and Treatment on Brain Functional Diseases, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
| | - Yue Yu
- Department of Health Sciences Research, Mayo Clinic, MN, 55901, USA
| | - Yiyun Liu
- NHC Key Laboratory of Diagnosis and Treatment on Brain Functional Diseases, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
| | - Dongfang Wang
- NHC Key Laboratory of Diagnosis and Treatment on Brain Functional Diseases, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
| | - Siwen Gui
- NHC Key Laboratory of Diagnosis and Treatment on Brain Functional Diseases, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
| | - Xiaogang Zhong
- NHC Key Laboratory of Diagnosis and Treatment on Brain Functional Diseases, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
| | - Weiyi Chen
- Department of Neurology, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
- NHC Key Laboratory of Diagnosis and Treatment on Brain Functional Diseases, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
| | - Xiaopeng Chen
- Department of Neurology, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
- NHC Key Laboratory of Diagnosis and Treatment on Brain Functional Diseases, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
| | - Yue Chen
- Department of Neurology, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
- NHC Key Laboratory of Diagnosis and Treatment on Brain Functional Diseases, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
| | - Xiang Chen
- Department of Neurology, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
- NHC Key Laboratory of Diagnosis and Treatment on Brain Functional Diseases, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
| | - Renjie Qiao
- Department of Neurology, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
- NHC Key Laboratory of Diagnosis and Treatment on Brain Functional Diseases, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
| | - Yanyi Jiang
- Department of Neurology, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
- NHC Key Laboratory of Diagnosis and Treatment on Brain Functional Diseases, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
| | - Hanping Zhang
- Department of Neurology, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
- NHC Key Laboratory of Diagnosis and Treatment on Brain Functional Diseases, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
| | - Li Fan
- Department of Neurology, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
- NHC Key Laboratory of Diagnosis and Treatment on Brain Functional Diseases, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
| | - Yi Ren
- Department of Neurology, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
- NHC Key Laboratory of Diagnosis and Treatment on Brain Functional Diseases, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
| | - Xiangyu Chen
- Department of Neurology, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
- NHC Key Laboratory of Diagnosis and Treatment on Brain Functional Diseases, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
| | - Haiyang Wang
- NHC Key Laboratory of Diagnosis and Treatment on Brain Functional Diseases, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
| | - Peng Xie
- Department of Neurology, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China.
- NHC Key Laboratory of Diagnosis and Treatment on Brain Functional Diseases, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China.
- The Jinfeng Laboratory, Chongqing, 401336, China.
- Chongqing Institute for Brain and Intelligence, Chongqing, 400072, China.
| |
Collapse
|
6
|
Yu S, Liao B, Zhu W, Peng D, Wu F. Accurate prediction and key protein sequence feature identification of cyclins. Brief Funct Genomics 2023; 22:411-419. [PMID: 37118891 DOI: 10.1093/bfgp/elad014] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2023] [Revised: 03/03/2023] [Accepted: 03/17/2023] [Indexed: 04/30/2023] Open
Abstract
Cyclin proteins are a group of proteins that activate the cell cycle by forming complexes with cyclin-dependent kinases. Identifying cyclins correctly can provide key clues to understanding the function of cyclins. However, due to the low similarity between cyclin protein sequences, the advancement of a machine learning-based approach to identify cycles is urgently needed. In this study, cyclin protein sequence features were extracted using the profile-based auto-cross covariance method. Then the features were ranked and selected with maximum relevance-maximum distance (MRMD) 1.0 and MRMD2.0. Finally, the prediction model was assessed through 10-fold cross-validation. The computational experiments showed that the best protein sequence features generated by MRMD1.0 could correctly predict 98.2% of cyclins using the random forest (RF) classifier, whereas seven-dimensional key protein sequence features identified with MRMD2.0 could correctly predict 96.1% of cyclins, which was superior to previous studies on the same dataset both in terms of dimensionality and performance comparisons. Therefore, our work provided a valuable tool for identifying cyclins. The model data can be downloaded from https://github.com/YUshunL/cyclin.
Collapse
Affiliation(s)
- Shaoyou Yu
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China
- Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou, China
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Bo Liao
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China
- Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou, China
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Wen Zhu
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China
- Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou, China
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Dejun Peng
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China
- Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou, China
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Fangxiang Wu
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China
- Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou, China
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| |
Collapse
|
7
|
Yang H, Liu Y, Yang Y, Li D, Wang Z. InDEP: an interpretable machine learning approach to predict cancer driver genes from multi-omics data. Brief Bioinform 2023; 24:bbad318. [PMID: 37649392 DOI: 10.1093/bib/bbad318] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2023] [Revised: 06/14/2023] [Accepted: 08/16/2023] [Indexed: 09/01/2023] Open
Abstract
Cancer driver genes are critical in driving tumor cell growth, and precisely identifying these genes is crucial in advancing our understanding of cancer pathogenesis and developing targeted cancer drugs. Despite the current methods for discovering cancer driver genes that mainly rely on integrating multi-omics data, many existing models are overly complex, and it is difficult to interpret the results accurately. This study aims to address this issue by introducing InDEP, an interpretable machine learning framework based on cascade forests. InDEP is designed with easy-to-interpret features, cascade forests based on decision trees and a KernelSHAP module that enables fine-grained post-hoc interpretation. Integrating multi-omics data, InDEP can identify essential features of classified driver genes at both the gene and cancer-type levels. The framework accurately identifies driver genes, discovers new patterns that make genes as driver genes and refines the cancer driver gene catalog. In comparison with state-of-the-art methods, InDEP proved to be more accurate on the test set and identified reliable candidate driver genes. Mutational features were the primary drivers for InDEP's identifying driver genes, with other omics features also contributing. At the gene level, the framework concluded that substitution-type mutations were the main reason most genes were identified as driver genes. InDEP's ability to identify reliable candidate driver genes opens up new avenues for precision oncology and discovering new biomedical knowledge. This framework can help advance cancer research by providing an interpretable method for identifying cancer driver genes and their contribution to cancer pathogenesis, facilitating the development of targeted cancer drugs.
Collapse
Affiliation(s)
- Hai Yang
- Department of Computer Science and Engineering, East China University of Science and Technology, 200237, Shanghai, PR China
| | - Yawen Liu
- Department of Computer Science and Engineering, East China University of Science and Technology, 200237, Shanghai, PR China
| | - Yijing Yang
- Department of Computer Science, University of Illinois Urbana-Champaign, Champaign, Illinois, United States of America
| | - Dongdong Li
- Department of Computer Science and Engineering, East China University of Science and Technology, 200237, Shanghai, PR China
| | - Zhe Wang
- Department of Computer Science and Engineering, East China University of Science and Technology, 200237, Shanghai, PR China
| |
Collapse
|
8
|
Yan F, Liu Y, Zhang T, Shen Y. Identifying TNF and IL6 as potential hub genes and targeted drugs associated with scleritis: A bio-informative report. Front Immunol 2023; 14:1098140. [PMID: 37063831 PMCID: PMC10102337 DOI: 10.3389/fimmu.2023.1098140] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Accepted: 03/22/2023] [Indexed: 04/03/2023] Open
Abstract
BackgroundScleritis is a serious inflammatory eye disease that can lead to blindness. The etiology and pathogenesis of scleritis remain unclear, and increasing evidence indicates that some specific genes and proteins are involved. This study aimed to identify pivotal genes and drug targets for scleritis, thus providing new directions for the treatment of this disease.MethodsWe screened candidate genes and proteins associated with scleritis by text-mining the PubMed database using Python, and assessed their functions by using the DAVID database. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses were used to identify the functional enrichment of these genes and proteins. Then, the hub genes were identified with CytoHubba and assessed by protein-protein interaction (PPI) network analysis. And the serum from patients with active scleritis and healthy subjects were used for the validation of hub genes. Finally, the DGIdb database was used to predict targeted drugs for the hub genes for treating scleritis.ResultsA total of 56 genes and proteins were found to be linked to scleritis, and 65 significantly altered pathways were identified in the KEGG analysis (FDR < 0.05). Most of the top five pathways involved the categories “Rheumatoid arthritis,” “Inflammatory bowel disease”, “Type I diabetes mellitus,” and “Graft-versus-host disease”. TNF and IL6 were considered to be the top 2 hub genes through CytoHubba. Based on our serum samples, hub genes are expressed at high levels in active scleritis. Five scleritis-targeting drugs were found among 88 identified drugs.ConclusionsThis study provides key genes and drug targets related to scleritis through bioinformatics analysis. TNF and IL6 are considered key mediators and possible drug targets of scleritis. Five drug candidates may play an important role in the diagnosis and treatment of scleritis in the future, which is worthy of the further experimental and clinical study.
Collapse
Affiliation(s)
- Feiyue Yan
- Eye Center, Renmin Hospital of Wuhan University, Wuhan, China
- Frontier Science Center of Immunology and Metabolism, Medical Research Institute, Wuhan University, Wuhan, China
| | - Yizong Liu
- Eye Center, Renmin Hospital of Wuhan University, Wuhan, China
| | - Tianlu Zhang
- Eye Center, Renmin Hospital of Wuhan University, Wuhan, China
| | - Yin Shen
- Eye Center, Renmin Hospital of Wuhan University, Wuhan, China
- Frontier Science Center of Immunology and Metabolism, Medical Research Institute, Wuhan University, Wuhan, China
- *Correspondence: Yin Shen,
| |
Collapse
|
9
|
Feng J, Wu S, Yang H, Ai C, Qiao J, Xu J, Guo F. Microbe-bridged disease-metabolite associations identification by heterogeneous graph fusion. Brief Bioinform 2022; 23:6720417. [PMID: 36168719 DOI: 10.1093/bib/bbac423] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Revised: 08/22/2022] [Accepted: 08/31/2022] [Indexed: 12/14/2022] Open
Abstract
MOTIVATION Metabolomics has developed rapidly in recent years, and metabolism-related databases are also gradually constructed. Nowadays, more and more studies are being carried out on diverse microbes, metabolites and diseases. However, the logics of various associations among microbes, metabolites and diseases are limited understanding in the biomedicine of gut microbial system. The collection and analysis of relevant microbial bioinformation play an important role in the revelation of microbe-metabolite-disease associations. Therefore, the dataset that integrates multiple relationships and the method based on complex heterogeneous graphs need to be developed. RESULTS In this study, we integrated some databases and extracted a variety of associations data among microbes, metabolites and diseases. After obtaining the three interconnected bilateral association data (microbe-metabolite, metabolite-disease and disease-microbe), we considered building a heterogeneous graph to describe the association data. In our model, microbes were used as a bridge between diseases and metabolites. In order to fuse the information of disease-microbe-metabolite graph, we used the bipartite graph attention network on the disease-microbe and metabolite-microbe bipartite graph. The experimental results show that our model has good performance in the prediction of various disease-metabolite associations. Through the case study of type 2 diabetes mellitus, Parkinson's disease, inflammatory bowel disease and liver cirrhosis, it is noted that our proposed methodology are valuable for the mining of other associations and the prediction of biomarkers for different human diseases.Availability and implementation: https://github.com/Selenefreeze/DiMiMe.git.
Collapse
Affiliation(s)
- Jitong Feng
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Shengbo Wu
- School of Chemical Engineering and Technology, Tianjin University, Tianjin, China.,Zhejiang Shaoxing Research Institute of Tianjin University, Shaoxing, China
| | - Hongpeng Yang
- School of Computational Science and Engineering, University of South Carolina, Columbia, U.S
| | - Chengwei Ai
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Jianjun Qiao
- School of Chemical Engineering and Technology, Tianjin University, Tianjin, China.,Zhejiang Shaoxing Research Institute of Tianjin University, Shaoxing, China
| | - Junhai Xu
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Fei Guo
- School of Computer Science and Engineering, Central South University, Changsha, China
| |
Collapse
|
10
|
Banimfreg BH, Shamayleh A, Alshraideh H. Survey for Computer-Aided Tools and Databases in Metabolomics. Metabolites 2022; 12:metabo12101002. [PMID: 36295904 PMCID: PMC9610953 DOI: 10.3390/metabo12101002] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2022] [Revised: 10/08/2022] [Accepted: 10/12/2022] [Indexed: 11/14/2022] Open
Abstract
Metabolomics has advanced from innovation and functional genomics tools and is currently a basis in the big data-led precision medicine era. Metabolomics is promising in the pharmaceutical field and clinical research. However, due to the complexity and high throughput data generated from such experiments, data mining and analysis are significant challenges for researchers in the field. Therefore, several efforts were made to develop a complete workflow that helps researchers analyze data. This paper introduces a review of the state-of-the-art computer-aided tools and databases in metabolomics established in recent years. The paper provides computational tools and resources based on functionality and accessibility and provides hyperlinks to web pages to download or use. This review aims to present the latest computer-aided tools, databases, and resources to the metabolomics community in one place.
Collapse
|
11
|
Zhang H, Zou Q, Ju Y, Song C, Chen D. Distance-based support vector machine to predict DNA N6-methyladenine modification. Curr Bioinform 2022. [DOI: 10.2174/1574893617666220404145517] [Citation(s) in RCA: 89] [Impact Index Per Article: 29.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
DNA N6-methyladenine plays an important role in the restriction-modification system to isolate invasion from adventive DNA. The shortcomings of the high time-consumption and high costs of experimental methods have been exposed, and some computational methods have emerged. The support vector machine theory has received extensive attention in the bioinformatics field due to its solid theoretical foundation and many good characteristics.
Objective:
General machine learning methods include an important step of extracting features. The research has omitted this step and replaced with easy-to-obtain sequence distances matrix to obtain better results
Method:
First sequence alignment technology was used to achieve the similarity matrix. Then a novel transformation turned the similarity matrix into a distance matrix. Next, the similarity-distance matrix is made positive semi-definite so that it can be used in the kernel matrix. Finally, the LIBSVM software was applied to solve the support vector machine.
Results:
The five-fold cross-validation of this model on rice and mouse data has achieved excellent accuracy rates of 92.04% and 96.51%, respectively. This shows that the DB-SVM method has obvious advantages compared with traditional machine learning methods. Meanwhile this model achieved 0.943,0.982 and 0.818 accuracy,0.944, 0.982, and 0.838 Matthews correlation coefficient and 0.942, 0.982 and 0.840 F1 scores for the rice, M. musculus and cross-species genome datasets, respectively.
Conclusion:
These outcomes show that this model outperforms the iIM-CNN and csDMA in the prediction of DNA 6mA modification, which are the lastest research on DNA 6mA.
Collapse
Affiliation(s)
- Haoyu Zhang
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610051, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610051, China
| | - Ying Ju
- School of Informatics, Xiamen University, Xiamen 361005, China
| | - Chenggang Song
- Beidahuang Industry Group General Hospital, Harbin 150001, China
| | - Dong Chen
- College of Electrical and Information Engineering, Quzhou University, Quzhou 324000, China
| |
Collapse
|
12
|
Dash P, Mohapatra SR, Pati S. Metabolomics of Multimorbidity: Could It Be the Quo Vadis? Front Mol Biosci 2022; 9:848971. [PMID: 35359598 PMCID: PMC8962190 DOI: 10.3389/fmolb.2022.848971] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2022] [Accepted: 01/28/2022] [Indexed: 11/16/2022] Open
Abstract
Multimorbidity, the simultaneous presence of two or more chronic diseases, affects the health care to a great extent. Its association with health care cost, more disability, and poor quality of life makes it a major public health risk. The matter of worry is that management of a multimorbid condition is complicated by the fact that multiple types of treatment may be required to treat different diseases at a time, and the interaction between some of the therapies can be detrimental. Understanding the causal factors of simultaneously occurring disease conditions and investigating the connected pathways involved in the whole process may resolve the complication. When different disease conditions present in an individual share common responsible factors, treatment strategies targeting at those common causes will certainly reduce the chance of development of multimorbidity occurring because of those factors. Metabolomics that can dig out the underlying metabolites/molecules of a medical condition is believed to be an effective technique for identification of biomarkers and intervention of effective treatment strategies for multiple diseases. We hypothesize that understanding the metabolic profile may shed light on targeting the common culprit for different/similar chronic diseases ultimately making the treatment strategy more effective with a combinatorial effect.
Collapse
Affiliation(s)
- Pujarini Dash
- Regional Medical Research Centre, Bhubaneswar, India
| | - Soumya R. Mohapatra
- Department of Research and Development, Kalinga Institute of Medical Sciences, KIIT Deemed to Be University, Bhubaneswar, India
- School of Biotechnology, Kalinga Institute of Industrial Technology (KIIT), Deemed to Be University, Bhubaneswar, India
| | - Sanghamitra Pati
- Regional Medical Research Centre, Bhubaneswar, India
- *Correspondence: Sanghamitra Pati,
| |
Collapse
|
13
|
Lei X, Tie J, Pan Y. Inferring Metabolite-Disease Association Using Graph Convolutional Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:688-698. [PMID: 33705323 DOI: 10.1109/tcbb.2021.3065562] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
As is well known, biological experiments are time-consuming and laborious, so there is absolutely no doubt that developing an effective computational model will help solve these problems. Most of computational models rely on the biological similarity and network-based methods that cannot consider the topological structures of metabolite-disease association graphs. We proposed a novel method based on graph convolutional networks to infer potential metabolite-disease association, named MDAGCN. We first calculated three kinds of metabolite similarities and three kinds of disease similarities. The final similarity of disease and metabolite will be obtained by integrating three kinds' similarities of each and filtering out the noise similarity values. Then metabolite similarity network, disease similarity network and known metabolite-disease association network were used to construct a heterogenous network. Finally, heterogeneous network with rich information is fed into the graph convolutional networks to obtain new features of a node through aggregation of node information so as to infer the potential associations between metabolites and diseases. Experimental results show that MDAGCN achieves more reliable results in cross validation and case studies when compared with other existing methods.
Collapse
|
14
|
Sun W, Du D, Fu T, Han Y, Li P, Ju H. Alterations of the Gut Microbiota in Patients With Severe Chronic Heart Failure. Front Microbiol 2022; 12:813289. [PMID: 35173696 PMCID: PMC8843083 DOI: 10.3389/fmicb.2021.813289] [Citation(s) in RCA: 43] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2021] [Accepted: 12/03/2021] [Indexed: 12/12/2022] Open
Abstract
Chronic heart failure (CHF) is the final outcome of almost all forms of cardiovascular diseases, remaining the main cause of mortality worldwide. Accumulating evidence is focused on the roles of gut microbial community in cardiovascular disease, but few studies have unveiled the alterations and further directions of gut microbiota in severe CHF patients. Aimed to investigate this deficiency, fecal samples from 29 CHF patients diagnosed with NYHA Class III-IV and 30 healthy controls were collected and then analyzed using bacterial 16S rRNA gene sequencing. As a result, there were many significant differences between the two groups. Firstly, the phylum Firmicutes was found to be remarkably decreased in severe CHF patients, and the phylum Proteobacteria was the second most abundant phyla in severe CHF patients instead of phylum Bacteroides strangely. Secondly, the α diversity indices such as chao1, PD-whole-tree and Shannon indices were significantly decreased in the severe CHF versus the control group, as well as the notable difference in β-diversity between the two groups. Thirdly, our result revealed a remarkable decrease in the abundance of the short-chain fatty acids (SCFA)-producing bacteria including genera Ruminococcaceae UCG-004, Ruminococcaceae UCG-002, Lachnospiraceae FCS020 group, Dialister and the increased abundance of the genera in Enterococcus and Enterococcaceae with an increased production of lactic acid. Finally, the alternation of the gut microbiota was presumably associated with the function including Cell cycle control, cell division, chromosome partitioning, Amino acid transport and metabolism and Carbohydrate transport and metabolism through SCFA pathway. Our findings provide the direction and theoretical knowledge for the regulation of gut flora in the treatment of severe CHF.
Collapse
Affiliation(s)
- Weiju Sun
- Department of Cardiology, The First Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Debing Du
- Beidahuang Industry Group General Hospital, Harbin, China
| | - Tongze Fu
- Harbin Medical University, Harbin, China
| | - Ying Han
- Department of Cardiology, The Fourth Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Peng Li
- National Center for Biomedical Analysis, Beijing, China
| | - Hong Ju
- Heilongjiang Vocational College of Biology Science and Technology, Harbin, China
| |
Collapse
|
15
|
Xiang J, Zhang J, Zhao Y, Wu FX, Li M. Biomedical data, computational methods and tools for evaluating disease-disease associations. Brief Bioinform 2022; 23:6522999. [PMID: 35136949 DOI: 10.1093/bib/bbac006] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2021] [Revised: 01/04/2022] [Accepted: 01/05/2022] [Indexed: 12/12/2022] Open
Abstract
In recent decades, exploring potential relationships between diseases has been an active research field. With the rapid accumulation of disease-related biomedical data, a lot of computational methods and tools/platforms have been developed to reveal intrinsic relationship between diseases, which can provide useful insights to the study of complex diseases, e.g. understanding molecular mechanisms of diseases and discovering new treatment of diseases. Human complex diseases involve both external phenotypic abnormalities and complex internal molecular mechanisms in organisms. Computational methods with different types of biomedical data from phenotype to genotype can evaluate disease-disease associations at different levels, providing a comprehensive perspective for understanding diseases. In this review, available biomedical data and databases for evaluating disease-disease associations are first summarized. Then, existing computational methods for disease-disease associations are reviewed and classified into five groups in terms of the usages of biomedical data, including disease semantic-based, phenotype-based, function-based, representation learning-based and text mining-based methods. Further, we summarize software tools/platforms for computation and analysis of disease-disease associations. Finally, we give a discussion and summary on the research of disease-disease associations. This review provides a systematic overview for current disease association research, which could promote the development and applications of computational methods and tools/platforms for disease-disease associations.
Collapse
Affiliation(s)
- Ju Xiang
- School of Computer Science and Engineering, Central South University, China
| | - Jiashuai Zhang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China
| | - Yichao Zhao
- School of Computer Science and Engineering, Central South University, China
| | - Fang-Xiang Wu
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China
| | - Min Li
- Division of Biomedical Engineering and Department of Mechanical Engineering at University of Saskatchewan, Saskatoon, Canada
| |
Collapse
|
16
|
Convalescing the Process of Ranking Metabolites for Diseases using Subcellular Localization. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING 2022. [DOI: 10.1007/s13369-021-06023-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
17
|
Zhang Z, Gong Y, Gao B, Li H, Gao W, Zhao Y, Dong B. SNAREs-SAP: SNARE Proteins Identification With PSSM Profiles. Front Genet 2022; 12:809001. [PMID: 34987554 PMCID: PMC8721734 DOI: 10.3389/fgene.2021.809001] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2021] [Accepted: 11/15/2021] [Indexed: 12/20/2022] Open
Abstract
Soluble N-ethylmaleimide sensitive factor activating protein receptor (SNARE) proteins are a large family of transmembrane proteins located in organelles and vesicles. The important roles of SNARE proteins include initiating the vesicle fusion process and activating and fusing proteins as they undergo exocytosis activity, and SNARE proteins are also vital for the transport regulation of membrane proteins and non-regulatory vesicles. Therefore, there is great significance in establishing a method to efficiently identify SNARE proteins. However, the identification accuracy of the existing methods such as SNARE CNN is not satisfied. In our study, we developed a method based on a support vector machine (SVM) that can effectively recognize SNARE proteins. We used the position-specific scoring matrix (PSSM) method to extract features of SNARE protein sequences, used the support vector machine recursive elimination correlation bias reduction (SVM-RFE-CBR) algorithm to rank the importance of features, and then screened out the optimal subset of feature data based on the sorted results. We input the feature data into the model when building the model, used 10-fold crossing validation for training, and tested model performance by using an independent dataset. In independent tests, the ability of our method to identify SNARE proteins achieved a sensitivity of 68%, specificity of 94%, accuracy of 92%, area under the curve (AUC) of 84%, and Matthew’s correlation coefficient (MCC) of 0.48. The results of the experiment show that the common evaluation indicators of our method are excellent, indicating that our method performs better than other existing classification methods in identifying SNARE proteins.
Collapse
Affiliation(s)
- Zixiao Zhang
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
| | - Yue Gong
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
| | - Bo Gao
- Department of Radiology, The Second Affiliated Hospital, Harbin Medical University, Harbin, China
| | - Hongfei Li
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
| | - Wentao Gao
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
| | - Yuming Zhao
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
| | - Benzhi Dong
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
| |
Collapse
|
18
|
Ma Y, Ma Y. Hypergraph-based logistic matrix factorization for metabolite-disease interaction prediction. Bioinformatics 2022; 38:435-443. [PMID: 34499104 DOI: 10.1093/bioinformatics/btab652] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2021] [Revised: 08/08/2021] [Accepted: 09/06/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Function-related metabolites, the terminal products of the cell regulation, show a close association with complex diseases. The identification of disease-related metabolites is critical to the diagnosis, prevention and treatment of diseases. However, most existing computational approaches build networks by calculating pairwise relationships, which is inappropriate for mining higher-order relationships. RESULTS In this study, we presented a novel approach with hypergraph-based logistic matrix factorization, HGLMF, to predict the potential interactions between metabolites and disease. First, the molecular structures and gene associations of metabolites and the hierarchical structures and GO functional annotations of diseases were extracted to build various similarity measures of metabolites and diseases. Next, the kernel neighborhood similarity of metabolites (or diseases) was calculated according to the completed interactive network. Second, multiple networks of metabolites and diseases were fused, respectively, and the hypergraph structures of metabolites and diseases were built. Finally, a logistic matrix factorization based on hypergraph was proposed to predict potential metabolite-disease interactions. In computational experiments, HGLMF accurately predicted the metabolite-disease interaction, and performed better than other state-of-the-art methods. Moreover, HGLMF could be used to predict new metabolites (or diseases). As suggested from the case studies, the proposed method could discover novel disease-related metabolites, which has been confirmed in existing studies. AVAILABILITY AND IMPLEMENTATION The codes and dataset are available at: https://github.com/Mayingjun20179/HGLMF. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yingjun Ma
- School of Applied Mathematics, Xiamen University of Technology, Xiamen 361024, China
| | - Yuanyuan Ma
- School of Computer & Information Engineering, Anyang Normal University, Anyang 455000, China
| |
Collapse
|
19
|
Chen Y, Juan L, Lv X, Shi L. Bioinformatics Research on Drug Sensitivity Prediction. Front Pharmacol 2021; 12:799712. [PMID: 34955863 PMCID: PMC8696280 DOI: 10.3389/fphar.2021.799712] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2021] [Accepted: 11/18/2021] [Indexed: 11/28/2022] Open
Abstract
Modeling-based anti-cancer drug sensitivity prediction has been extensively studied in recent years. While most drug sensitivity prediction models only use gene expression data, the remarkable impacts of gene mutation, methylation, and copy number variation on drug sensitivity are neglected. Drug sensitivity prediction can both help protect patients from some adverse drug reactions and improve the efficacy of treatment. Genomics data are extremely useful for drug sensitivity prediction task. This article reviews the role of drug sensitivity prediction, describes a variety of methods for predicting drug sensitivity. Moreover, the research significance of drug sensitivity prediction, as well as existing problems are well discussed.
Collapse
Affiliation(s)
- Yaojia Chen
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Liran Juan
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Xiao Lv
- Beidahuang Industry Group General Hospital, Harbin, China
| | - Lei Shi
- Department of Spine Surgery Changzheng Hospital, Naval Medical University, Shanghai, China
| |
Collapse
|
20
|
Guo Y, Hou L, Zhu W, Wang P. Prediction of Hormone-Binding Proteins Based on K-mer Feature Representation and Naive Bayes. Front Genet 2021; 12:797641. [PMID: 34887905 PMCID: PMC8650314 DOI: 10.3389/fgene.2021.797641] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2021] [Accepted: 11/05/2021] [Indexed: 11/29/2022] Open
Abstract
Hormone binding protein (HBP) is a soluble carrier protein that interacts selectively with different types of hormones and has various effects on the body's life activities. HBPs play an important role in the growth process of organisms, but their specific role is still unclear. Therefore, correctly identifying HBPs is the first step towards understanding and studying their biological function. However, due to their high cost and long experimental period, it is difficult for traditional biochemical experiments to correctly identify HBPs from an increasing number of proteins, so the real characterization of HBPs has become a challenging task for researchers. To measure the effectiveness of HBPs, an accurate and reliable prediction model for their identification is desirable. In this paper, we construct the prediction model HBP_NB. First, HBPs data were collected from the UniProt database, and a dataset was established. Then, based on the established high-quality dataset, the k-mer (K = 3) feature representation method was used to extract features. Second, the feature selection algorithm was used to reduce the dimensionality of the extracted features and select the appropriate optimal feature set. Finally, the selected features are input into Naive Bayes to construct the prediction model, and the model is evaluated by using 10-fold cross-validation. The final results were 95.45% accuracy, 94.17% sensitivity and 96.73% specificity. These results indicate that our model is feasible and effective.
Collapse
Affiliation(s)
- Yuxin Guo
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China
- Yangtze Delta Region Institute, University of Electronic Science and Technology of China, Quzhou, China
- Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou, China
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Liping Hou
- Beidahuang Industry Group General Hospital, Harbin, China
| | - Wen Zhu
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China
- Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou, China
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Peng Wang
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China
- Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou, China
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| |
Collapse
|
21
|
Xu H, Zhao B, Zhong W, Teng P, Qiao H. Identification of miRNA Signature Associated With Erectile Dysfunction in Type 2 Diabetes Mellitus by Support Vector Machine-Recursive Feature Elimination. Front Genet 2021; 12:762136. [PMID: 34707644 PMCID: PMC8542849 DOI: 10.3389/fgene.2021.762136] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2021] [Accepted: 09/22/2021] [Indexed: 01/10/2023] Open
Abstract
Diabetic mellitus erectile dysfunction (DMED) is one of the most common complications of diabetes mellitus (DM), which seriously affects the self-esteem and quality of life of diabetics. MicroRNAs (miRNAs) are endogenous non-coding RNAs whose expression levels can affect multiple cellular processes. Many pieces of studies have demonstrated that miRNA plays a role in the occurrence and development of DMED. However, the exact mechanism of this process is unclear. Hence, we apply miRNA sequencing from blood samples of 10 DMED patients and 10 DM controls to study the mechanisms of miRNA interactions in DMED patients. Firstly, we found four characteristic miRNAs as signature by the SVM-RFE method (hsa-let-7E-5p, hsa-miR-30 days-5p, hsa-miR-199b-5p, and hsa-miR-342–3p), called DMEDSig-4. Subsequently, we correlated DMEDSig-4 with clinical factors and further verified the ability of these miRNAs to classify samples. Finally, we functionally verified the relationship between DMEDSig-4 and DMED by pathway enrichment analysis of miRNA and its target genes. In brief, our study found four key miRNAs, which may be the key influencing factors of DMED. Meanwhile, the DMEDSig-4 could help in the development of new therapies for DMED.
Collapse
Affiliation(s)
- Haibo Xu
- The Second Affiliated Hospital of Harbin Medical University, Harbin, China.,The First Hospital of Qiqihar, Qiqihar, China
| | - Baoyin Zhao
- The First Hospital of Qiqihar, Qiqihar, China
| | - Wei Zhong
- The First Hospital of Qiqihar, Qiqihar, China
| | - Peng Teng
- The First Hospital of Qiqihar, Qiqihar, China
| | - Hong Qiao
- The Second Affiliated Hospital of Harbin Medical University, Harbin, China
| |
Collapse
|
22
|
Jiao S, Zou Q, Guo H, Shi L. iTTCA-RF: a random forest predictor for tumor T cell antigens. J Transl Med 2021; 19:449. [PMID: 34706730 PMCID: PMC8554859 DOI: 10.1186/s12967-021-03084-x] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2021] [Accepted: 09/16/2021] [Indexed: 12/21/2022] Open
Abstract
BACKGROUND Cancer is one of the most serious diseases threatening human health. Cancer immunotherapy represents the most promising treatment strategy due to its high efficacy and selectivity and lower side effects compared with traditional treatment. The identification of tumor T cell antigens is one of the most important tasks for antitumor vaccines development and molecular function investigation. Although several machine learning predictors have been developed to identify tumor T cell antigen, more accurate tumor T cell antigen identification by existing methodology is still challenging. METHODS In this study, we used a non-redundant dataset of 592 tumor T cell antigens (positive samples) and 393 tumor T cell antigens (negative samples). Four types feature encoding methods have been studied to build an efficient predictor, including amino acid composition, global protein sequence descriptors and grouped amino acid and peptide composition. To improve the feature representation ability of the hybrid features, we further employed a two-step feature selection technique to search for the optimal feature subset. The final prediction model was constructed using random forest algorithm. RESULTS Finally, the top 263 informative features were selected to train the random forest classifier for detecting tumor T cell antigen peptides. iTTCA-RF provides satisfactory performance, with balanced accuracy, specificity and sensitivity values of 83.71%, 78.73% and 88.69% over tenfold cross-validation as well as 73.14%, 62.67% and 83.61% over independent tests, respectively. The online prediction server was freely accessible at http://lab.malab.cn/~acy/iTTCA . CONCLUSIONS We have proven that the proposed predictor iTTCA-RF is superior to the other latest models, and will hopefully become an effective and useful tool for identifying tumor T cell antigens presented in the context of major histocompatibility complex class I.
Collapse
Affiliation(s)
- Shihu Jiao
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Quan Zou
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Huannan Guo
- Department of Oncology, General Hospital of Heilongjiang Province Land Reclamation Bureau, Harbin, China.
| | - Lei Shi
- Department of Spine Surgery, Changzheng Hospital, Naval Medical University, Shanghai, China.
| |
Collapse
|
23
|
Liu T, Chen J, Zhang Q, Hippe K, Hunt C, Le T, Cao R, Tang H. The Development of Machine Learning Methods in discriminating Secretory Proteins of Malaria Parasite. Curr Med Chem 2021; 29:807-821. [PMID: 34636289 DOI: 10.2174/0929867328666211005140625] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Revised: 07/28/2021] [Accepted: 08/15/2021] [Indexed: 11/22/2022]
Abstract
Malaria caused by Plasmodium falciparum is one of the major infectious diseases in the world. It is essential to exploit an effective method to predict secretory proteins of malaria parasites to develop effective cures and treatment. Biochemical assays can provide details for accurate identification of the secretory proteins, but these methods are expensive and time-consuming. In this paper, we summarized the machine learning-based identification algorithms and compared the construction strategies between different computational methods. Also, we discussed the use of machine learning to improve the ability of algorithms to identify proteins secreted by malaria parasites.
Collapse
Affiliation(s)
- Ting Liu
- School of Basic Medical Sciences, Southwest Medical University, Luzhou. China
| | - Jiamao Chen
- School of Basic Medical Sciences, Southwest Medical University, Luzhou. China
| | - Qian Zhang
- School of Basic Medical Sciences, Southwest Medical University, Luzhou. China
| | - Kyle Hippe
- Department of Computer Science, Pacific Lutheran University. United States
| | - Cassandra Hunt
- Department of Computer Science, Pacific Lutheran University. United States
| | - Thu Le
- Department of Computer Science, Pacific Lutheran University. United States
| | - Renzhi Cao
- Department of Computer Science, Pacific Lutheran University. United States
| | - Hua Tang
- School of Basic Medical Sciences, Southwest Medical University, Luzhou. China
| |
Collapse
|
24
|
Wang T, Liu Y, Ruan J, Dong X, Wang Y, Peng J. A pipeline for RNA-seq based eQTL analysis with automated quality control procedures. BMC Bioinformatics 2021; 22:403. [PMID: 34433407 PMCID: PMC8386049 DOI: 10.1186/s12859-021-04307-0] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2021] [Accepted: 07/06/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Advances in the expression quantitative trait loci (eQTL) studies have provided valuable insights into the mechanism of diseases and traits-associated genetic variants. However, it remains challenging to evaluate and control the quality of multi-source heterogeneous eQTL raw data for researchers with limited computational background. There is an urgent need to develop a powerful and user-friendly tool to automatically process the raw datasets in various formats and perform the eQTL mapping afterward. RESULTS In this work, we present a pipeline for eQTL analysis, termed eQTLQC, featured with automated data preprocessing for both genotype data and gene expression data. Our pipeline provides a set of quality control and normalization approaches, and utilizes automated techniques to reduce manual intervention. We demonstrate the utility and robustness of this pipeline by performing eQTL case studies using multiple independent real-world datasets with RNA-seq data and whole genome sequencing (WGS) based genotype data. CONCLUSIONS eQTLQC provides a reliable computational workflow for eQTL analysis. It provides standard quality control and normalization as well as eQTL mapping procedures for eQTL raw data in multiple formats. The source code, demo data, and instructions are freely available at https://github.com/stormlovetao/eQTLQC .
Collapse
Affiliation(s)
- Tao Wang
- School of Computer Science, Northwestern Polytechnical University, 1 Dongxiang Road, Chang’an District, Xi’an, China
- School of Computer Science and Technology, Harbin Institute of Technology, West Dazhi St., Harbin, China
| | - Yongzhuang Liu
- School of Computer Science and Technology, Harbin Institute of Technology, West Dazhi St., Harbin, China
| | - Junpeng Ruan
- School of Computer Science, Northwestern Polytechnical University, 1 Dongxiang Road, Chang’an District, Xi’an, China
| | - Xianjun Dong
- Brigham and Women’s Hospital, Harvard Medical School, 75 Francis St., Boston, USA
| | - Yadong Wang
- School of Computer Science and Technology, Harbin Institute of Technology, West Dazhi St., Harbin, China
| | - Jiajie Peng
- School of Computer Science, Northwestern Polytechnical University, 1 Dongxiang Road, Chang’an District, Xi’an, China
| |
Collapse
|
25
|
Feng Y, Wang Z, Yang N, Liu S, Yan J, Song J, Yang S, Zhang Y. Identification of Biomarkers for Cervical Cancer Radiotherapy Resistance Based on RNA Sequencing Data. Front Cell Dev Biol 2021; 9:724172. [PMID: 34414195 PMCID: PMC8369412 DOI: 10.3389/fcell.2021.724172] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2021] [Accepted: 07/14/2021] [Indexed: 11/28/2022] Open
Abstract
Cervical cancer as a common gynecological malignancy threatens the health and lives of women. Resistance to radiotherapy is the primary cause of treatment failure and is mainly related to difference in the inherent vulnerability of tumors after radiotherapy. Here, we investigated signature genes associated with poor response to radiotherapy by analyzing an independent cervical cancer dataset from the Gene Expression Omnibus, including pre-irradiation and mid-irradiation information. A total of 316 differentially expressed genes were significantly identified. The correlations between these genes were investigated through the Pearson correlation analysis. Subsequently, random forest model was used in determining cancer-related genes, and all genes were ranked by random forest scoring. The top 30 candidate genes were selected for uncovering their biological functions. Functional enrichment analysis revealed that the biological functions chiefly enriched in tumor immune responses, such as cellular defense response, negative regulation of immune system process, T cell activation, neutrophil activation involved in immune response, regulation of antigen processing and presentation, and peptidyl-tyrosine autophosphorylation. Finally, the top 30 genes were screened and analyzed through literature verification. After validation, 10 genes (KLRK1, LCK, KIF20A, CD247, FASLG, CD163, ZAP70, CD8B, ZNF683, and F10) were to our objective. Overall, the present research confirmed that integrated bioinformatics methods can contribute to the understanding of the molecular mechanisms and potential therapeutic targets underlying radiotherapy resistance in cervical cancer.
Collapse
Affiliation(s)
- Yue Feng
- Department of Gynecological Radiotherapy, Harbin Medical University Cancer Hospital, Harbin, China
| | - Zhao Wang
- Department of Gynecological Radiotherapy, Harbin Medical University Cancer Hospital, Harbin, China
| | - Nan Yang
- Department of Gynecological Radiotherapy, Harbin Medical University Cancer Hospital, Harbin, China
| | - Sijia Liu
- Department of Gynecological Radiotherapy, Harbin Medical University Cancer Hospital, Harbin, China
| | - Jiazhuo Yan
- Department of Gynecological Radiotherapy, Harbin Medical University Cancer Hospital, Harbin, China
| | - Jiayu Song
- Department of Gynecological Radiotherapy, Harbin Medical University Cancer Hospital, Harbin, China
| | - Shanshan Yang
- Department of Gynecological Radiotherapy, Harbin Medical University Cancer Hospital, Harbin, China
| | - Yunyan Zhang
- Department of Gynecological Radiotherapy, Harbin Medical University Cancer Hospital, Harbin, China
| |
Collapse
|
26
|
Li Y, Pu F, Wang J, Zhou Z, Zhang C, He F, Ma Z, Zhang J. Machine Learning Methods in Prediction of Protein Palmitoylation Sites: A Brief Review. Curr Pharm Des 2021; 27:2189-2198. [PMID: 33183190 DOI: 10.2174/1381612826666201112142826] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2020] [Accepted: 07/27/2020] [Indexed: 11/22/2022]
Abstract
Protein palmitoylation is a fundamental and reversible post-translational lipid modification that involves a series of biological processes. Although a large number of experimental studies have explored the molecular mechanism behind the palmitoylation process, the computational methods has attracted much attention for its good performance in predicting palmitoylation sites compared with expensive and time-consuming biochemical experiments. The prediction of protein palmitoylation sites is helpful to reveal its biological mechanism. Therefore, the research on the application of machine learning methods to predict palmitoylation sites has become a hot topic in bioinformatics and promoted the development in the related fields. In this review, we briefly introduced the recent development in predicting protein palmitoylation sites by using machine learningbased methods and discussed their benefits and drawbacks. The perspective of machine learning-based methods in predicting palmitoylation sites was also provided. We hope the review could provide a guide in related fields.
Collapse
Affiliation(s)
- Yanwen Li
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
| | - Feng Pu
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
| | - Jingru Wang
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
| | - Zhiguo Zhou
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
| | - Chunhua Zhang
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
| | - Fei He
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
| | - Zhiqiang Ma
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
| | - Jingbo Zhang
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
| |
Collapse
|
27
|
Zulfiqar H, Yuan SS, Huang QL, Sun ZJ, Dao FY, Yu XL, Lin H. Identification of cyclin protein using gradient boost decision tree algorithm. Comput Struct Biotechnol J 2021; 19:4123-4131. [PMID: 34527186 PMCID: PMC8346528 DOI: 10.1016/j.csbj.2021.07.013] [Citation(s) in RCA: 40] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Revised: 07/15/2021] [Accepted: 07/15/2021] [Indexed: 12/12/2022] Open
Abstract
Cyclin proteins are capable to regulate the cell cycle by forming a complex with cyclin-dependent kinases to activate cell cycle. Correct recognition of cyclin proteins could provide key clues for studying their functions. However, their sequences share low similarity, which results in poor prediction for sequence similarity-based methods. Thus, it is urgent to construct a machine learning model to identify cyclin proteins. This study aimed to develop a computational model to discriminate cyclin proteins from non-cyclin proteins. In our model, protein sequences were encoded by seven kinds of features that are amino acid composition, composition of k-spaced amino acid pairs, tri peptide composition, pseudo amino acid composition, geary correlation, normalized moreau-broto autocorrelation and composition/transition/distribution. Afterward, these features were optimized by using analysis of variance (ANOVA) and minimum redundancy maximum relevance (mRMR) with incremental feature selection (IFS) technique. A gradient boost decision tree (GBDT) classifier was trained on the optimal features. Five-fold cross-validated results showed that our model would identify cyclins with an accuracy of 93.06% and AUC value of 0.971, which are higher than the two recent studies on the same data.
Collapse
Affiliation(s)
- Hasan Zulfiqar
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Shi-Shi Yuan
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Qin-Lai Huang
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Zi-Jie Sun
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Fu-Ying Dao
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Xiao-Long Yu
- School of Materials Science and Engineering, Hainan University, Haikou 570228, China
| | - Hao Lin
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| |
Collapse
|
28
|
Li L, Jing Q, Yan S, Liu X, Sun Y, Zhu D, Wang D, Hao C, Xue D. Amadis: A Comprehensive Database for Association Between Microbiota and Disease. Front Physiol 2021; 12:697059. [PMID: 34335304 PMCID: PMC8317061 DOI: 10.3389/fphys.2021.697059] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2021] [Accepted: 06/22/2021] [Indexed: 12/18/2022] Open
Abstract
The human gastrointestinal tract represents a symbiotic bioreactor that can mediate the interaction of the human host. The deployment and integration of multi-omics technologies have depicted a more complete image of the functions performed by microbial organisms. In addition, a large amount of data has been generated in a short time. However, researchers struggling to keep track of these mountains of information need a way to conveniently gain a comprehensive understanding of the relationship between microbiota and human diseases. To tackle this issue, we developed Amadis (http://gift2disease.net/GIFTED), a manually curated database that provides experimentally supported microbiota-disease associations and a dynamic network construction method. The current version of the Amadis database documents 20167 associations between 221 human diseases and 774 gut microbes across 17 species, curated from more than 1000 articles. By using the curated data, users can freely select and combine modules to obtain a specific microbe-based human disease network. Additionally, Amadis provides a user-friendly interface for browsing, searching and downloading. We hope it can serve as a useful and valuable resource for researchers exploring the associations between gastrointestinal microbiota and human diseases.
Collapse
Affiliation(s)
- Long Li
- Department of General Surgery, The First Affiliated Hospital of Harbin Medical University, Harbin, China.,Key Laboratory of Hepatosplenic Surgery, Ministry of Education, The First Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Qingxu Jing
- Department of General Surgery, The First Affiliated Hospital of Harbin Medical University, Harbin, China.,Key Laboratory of Hepatosplenic Surgery, Ministry of Education, The First Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Sen Yan
- Department of Cardiology, The First Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Xuxu Liu
- Department of General Surgery, The First Affiliated Hospital of Harbin Medical University, Harbin, China.,Key Laboratory of Hepatosplenic Surgery, Ministry of Education, The First Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Yuanyuan Sun
- Department of General Surgery, The First Affiliated Hospital of Harbin Medical University, Harbin, China.,Key Laboratory of Hepatosplenic Surgery, Ministry of Education, The First Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Defu Zhu
- Family Medicine General Practice Clinic, The University of Hong Kong-Shenzhen Hospital, Shenzhen, China
| | - Dawei Wang
- Department of General Surgery, The First Affiliated Hospital of Harbin Medical University, Harbin, China.,Key Laboratory of Hepatosplenic Surgery, Ministry of Education, The First Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Chenjun Hao
- Department of General Surgery, The First Affiliated Hospital of Harbin Medical University, Harbin, China.,Key Laboratory of Hepatosplenic Surgery, Ministry of Education, The First Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Dongbo Xue
- Department of General Surgery, The First Affiliated Hospital of Harbin Medical University, Harbin, China.,Key Laboratory of Hepatosplenic Surgery, Ministry of Education, The First Affiliated Hospital of Harbin Medical University, Harbin, China
| |
Collapse
|
29
|
Oommen AM, Cunningham S, O'Súilleabháin PS, Hughes BM, Joshi L. An integrative network analysis framework for identifying molecular functions in complex disorders examining major depressive disorder as a test case. Sci Rep 2021; 11:9645. [PMID: 33958659 PMCID: PMC8102631 DOI: 10.1038/s41598-021-89040-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2020] [Accepted: 04/14/2021] [Indexed: 12/02/2022] Open
Abstract
In addition to the psychological depressive phenotype, major depressive disorder (MDD) patients are also associated with underlying immune dysregulation that correlates with metabolic syndrome prevalent in depressive patients. A robust integrative analysis of biological pathways underlying the dysregulated neural connectivity and systemic inflammatory response will provide implications in the development of effective strategies for the diagnosis, management and the alleviation of associated comorbidities. In the current study, focusing on MDD, we explored an integrative network analysis methodology to analyze transcriptomic data combined with the meta-analysis of biomarker data available throughout public databases and published scientific peer-reviewed articles. Detailed gene set enrichment analysis and complex protein–protein, gene regulatory and biochemical pathway analysis has been undertaken to identify the functional significance and potential biomarker utility of differentially regulated genes, proteins and metabolite markers. This integrative analysis method provides insights into the molecular mechanisms along with key glycosylation dysregulation underlying altered neutrophil-platelet activation and dysregulated neuronal survival maintenance and synaptic functioning. Highlighting the significant gap that exists in the current literature, the network analysis framework proposed reduces the impact of data gaps and permits the identification of key molecular signatures underlying complex disorders with multiple etiologies such as within MDD and presents multiple treatment options to address their molecular dysfunction.
Collapse
Affiliation(s)
- Anup Mammen Oommen
- Advanced Glycoscience Research Cluster (AGRC), National University of Ireland Galway, Galway, Ireland.,Centre for Research in Medical Devices (CÚRAM), National University of Ireland Galway, Galway, Ireland
| | - Stephen Cunningham
- Advanced Glycoscience Research Cluster (AGRC), National University of Ireland Galway, Galway, Ireland. .,Centre for Research in Medical Devices (CÚRAM), National University of Ireland Galway, Galway, Ireland.
| | - Páraic S O'Súilleabháin
- Department of Psychology, University of Limerick, Limerick, Ireland.,Health Research Institute, University of Limerick, Limerick, Ireland
| | - Brian M Hughes
- School of Psychology, National University of Ireland Galway, Galway, Ireland
| | - Lokesh Joshi
- Advanced Glycoscience Research Cluster (AGRC), National University of Ireland Galway, Galway, Ireland. .,Centre for Research in Medical Devices (CÚRAM), National University of Ireland Galway, Galway, Ireland.
| |
Collapse
|
30
|
Zhang C, Lei X, Liu L. Predicting Metabolite-Disease Associations Based on LightGBM Model. Front Genet 2021; 12:660275. [PMID: 33927752 PMCID: PMC8078836 DOI: 10.3389/fgene.2021.660275] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Accepted: 03/05/2021] [Indexed: 11/21/2022] Open
Abstract
Metabolites have been shown to be closely related to the occurrence and development of many complex human diseases by a large number of biological experiments; investigating their correlation mechanisms is thus an important topic, which attracts many researchers. In this work, we propose a computational method named LGBMMDA, which is based on the Light Gradient Boosting Machine (LightGBM) to predict potential metabolite–disease associations. This method extracts the features from statistical measures, graph theoretical measures, and matrix factorization results, utilizing the principal component analysis (PCA) process to remove noise or redundancy. We evaluated our method compared with other used methods and demonstrated the better areas under the curve (AUCs) of LGBMMDA. Additionally, three case studies deeply confirmed that LGBMMDA has obvious superiority in predicting metabolite–disease pairs and represents a powerful bioinformatics tool.
Collapse
Affiliation(s)
- Cheng Zhang
- School of Computer Science, Shaanxi Normal University, Xi'an, China
| | - Xiujuan Lei
- School of Computer Science, Shaanxi Normal University, Xi'an, China
| | - Lian Liu
- School of Computer Science, Shaanxi Normal University, Xi'an, China
| |
Collapse
|
31
|
Sun W, Han Y, Yang S, Zhuang H, Zhang J, Cheng L, Fu L. The Assessment of Interleukin-18 on the Risk of Coronary Heart Disease. Med Chem 2021; 16:626-634. [PMID: 31584380 DOI: 10.2174/1573406415666191004115128] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2019] [Revised: 05/13/2019] [Accepted: 08/23/2019] [Indexed: 12/31/2022]
Abstract
BACKGROUND Observational studies support the inflammation hypothesis in coronary heart disease (CHD). As a pleiotropic proinflammatory cytokine, Interleukin-18 (IL-18), has also been found to be associated with the risk of CHD. However, to our knowledge, the method of Mendelian Randomization has not been used to explore the causal effect of IL-18 on CHD. OBJECTIVE To assess the causal effect of IL-18 on the risk of CHD. METHODS AND RESULTS Genetic variant instruments for IL-18 were obtained from information of the CHS and InCHIANTI cohort, and consisted of the per-allele difference in mean IL-18 for 16 independent variants that reached genome-wide significance. The per-allele difference in log-odds of CHD for each of these variants was estimated from CARDIoGRAMplusC4D, a two-stage meta -analysis. Two-sample Mendelian Randomization (MR) was then performed. Various MR analyses were used, including weighted inverse-variance, MR-Egger regression, robust regression, and penalized regression. The OR of elevated IL-18 associated with CHD was only 0.005 (95%CI -0.105~0.095; P-value=0.927). Similar results were obtained with the use of MR-Egger regression, suggesting that directional pleiotropy was unlikely biasing these results (intercept -0.050, P-value=0.220). Moreover, results from the robust regression and penalized regression analyses also revealed essentially similar findings. CONCLUSION Our findings indicate that, by itself, IL-18 is unlikely to represent even a modest causal factor for CHD risk.
Collapse
Affiliation(s)
- Weiju Sun
- Cardiovascular Department, The First Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Ying Han
- Cardiovascular Department, the Fourth Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Shuo Yang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - He Zhuang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Jingwen Zhang
- Department of Physiology and Biology, University of Mississippi Medical Center, United States
| | - Liang Cheng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Lu Fu
- Cardiovascular Department, The First Affiliated Hospital of Harbin Medical University, Harbin, China
| |
Collapse
|
32
|
Zhang ZM, Guan ZX, Wang F, Zhang D, Ding H. Application of Machine Learning Methods in Predicting Nuclear Receptors and their Families. Med Chem 2021; 16:594-604. [PMID: 31584374 DOI: 10.2174/1573406415666191004125551] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2019] [Revised: 06/18/2019] [Accepted: 08/23/2019] [Indexed: 11/22/2022]
Abstract
Nuclear receptors (NRs) are a superfamily of ligand-dependent transcription factors that are closely related to cell development, differentiation, reproduction, homeostasis, and metabolism. According to the alignments of the conserved domains, NRs are classified and assigned the following seven subfamilies or eight subfamilies: (1) NR1: thyroid hormone like (thyroid hormone, retinoic acid, RAR-related orphan receptor, peroxisome proliferator activated, vitamin D3- like), (2) NR2: HNF4-like (hepatocyte nuclear factor 4, retinoic acid X, tailless-like, COUP-TFlike, USP), (3) NR3: estrogen-like (estrogen, estrogen-related, glucocorticoid-like), (4) NR4: nerve growth factor IB-like (NGFI-B-like), (5) NR5: fushi tarazu-F1 like (fushi tarazu-F1 like), (6) NR6: germ cell nuclear factor like (germ cell nuclear factor), and (7) NR0: knirps like (knirps, knirpsrelated, embryonic gonad protein, ODR7, trithorax) and DAX like (DAX, SHP), or dividing NR0 into (7) NR7: knirps like and (8) NR8: DAX like. Different NRs families have different structural features and functions. Since the function of a NR is closely correlated with which subfamily it belongs to, it is highly desirable to identify NRs and their subfamilies rapidly and effectively. The knowledge acquired is essential for a proper understanding of normal and abnormal cellular mechanisms. With the advent of the post-genomics era, huge amounts of sequence-known proteins have increased explosively. Conventional methods for accurately classifying the family of NRs are experimental means with high cost and low efficiency. Therefore, it has created a greater need for bioinformatics tools to effectively recognize NRs and their subfamilies for the purpose of understanding their biological function. In this review, we summarized the application of machine learning methods in the prediction of NRs from different aspects. We hope that this review will provide a reference for further research on the classification of NRs and their families.
Collapse
Affiliation(s)
- Zi-Mei Zhang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Zheng-Xing Guan
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Fang Wang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Dan Zhang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Hui Ding
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| |
Collapse
|
33
|
Xiao X, Zhang Z, Luo R, Peng R, Sun Y, Wang J, Chen X. Identification of potential oncogenes in triple-negative breast cancer based on bioinformatics analyses. Oncol Lett 2021; 21:363. [PMID: 33747220 PMCID: PMC7967975 DOI: 10.3892/ol.2021.12624] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2020] [Accepted: 02/02/2021] [Indexed: 12/28/2022] Open
Abstract
Triple-negative breast cancer (TNBC) is a subtype with high rates of metastasis, poor prognosis and limited therapeutic options. The present study aimed to identify the potential pivotal genes for prognosis and treatment in TNBC. A total of two microarray expression datasets, GSE38959 and GSE65212, were downloaded from the Gene Expression Omnibus database, and RNA-sequencing data of breast cancer from The Cancer Genome Atlas database were analyzed to screen out differentially expressed genes (DEGs) between TNBC tissues and normal tissues. The intersection of DEGs was submitted to Gene Ontology and Kyoto Encyclopedia of Genes and Genomes enrichment analyses. A protein-protein interaction (PPI) network was constructed and visualized using Cytoscape software. Furthermore, module, centrality and survival analyses were performed to identify the potential hub genes. Reverse transcription-quantitative (RT-q)PCR analysis was performed to detect the expression levels of key genes in TNBC samples, and 377 DEGs were identified. Functional analysis revealed that the DEGs were significantly involved in cell cycle process, nuclear division and the p53 signaling pathway. A PPI network was constructed with these DEGs, and 66 core genes with high centrality features in module 1 were selected. Relapse-free survival analysis confirmed that high expression levels of five genes [cyclin B1 (CCNB1), GINS complex subunit 2, non-SMC condensin I complex subunit G (NCAPG), minichromosome maintenance 4 (MCM4) and ribonucleotide reductase regulatory subunit M2 (RRM2)] were significantly associated with poor prognosis in TNBC. RT-qPCR analysis demonstrated that CCNB1, NCAPG, MCM4 and RRM2 were significantly upregulated in 25 TNBC tissues compared with adjacent normal breast tissues. Furthermore, gene set enrichment analysis revealed that CCNB1, NCAPG, MCM4 and RRM2 were closely associated with tumor proliferation. Taken together, these results suggest that CCNB1, NCAPG, MCM4 and RRM2 are associated with tumorigenesis and TNBC progression, and thus may act as promising prognostic biomarkers and therapeutic targets for TNBC.
Collapse
Affiliation(s)
- Xiao Xiao
- Department of Surgery, The First Affiliated Hospital of Chongqing Medical University, Chongqing 400010, P.R. China
| | - Zheng Zhang
- Molecular Medicine and Cancer Research Center, Chongqing Medical University, Chongqing 400016, P.R. China
| | - Ruihan Luo
- Department of Bioinformatics, Chongqing Medical University, Chongqing 400016, P.R. China
| | - Rui Peng
- Department of Bioinformatics, Chongqing Medical University, Chongqing 400016, P.R. China
| | - Yan Sun
- Molecular Medicine and Cancer Research Center, Chongqing Medical University, Chongqing 400016, P.R. China
| | - Jia Wang
- Molecular Medicine and Cancer Research Center, Chongqing Medical University, Chongqing 400016, P.R. China
| | - Xin Chen
- Department of Surgery, The First Affiliated Hospital of Chongqing Medical University, Chongqing 400010, P.R. China
| |
Collapse
|
34
|
Chen J, Liu X, Shen L, Lin Y, Shen B. CMBD: a manually curated cancer metabolic biomarker knowledge database. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2021; 2021:6163092. [PMID: 33693668 PMCID: PMC7947571 DOI: 10.1093/database/baaa094] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/24/2020] [Revised: 09/03/2020] [Accepted: 10/14/2020] [Indexed: 02/05/2023]
Abstract
The pathogenesis of cancer is influenced by interactions among genes, proteins, metabolites and other small molecules. Understanding cancer progression at the metabolic level is propitious to the visual decoding of changes in living organisms. To date, a large number of metabolic biomarkers in cancer have been measured and reported, which provide an alternative method for cancer precision diagnosis, treatment and prognosis. To systematically understand the heterogeneity of cancers, we developed the database CMBD to integrate the cancer metabolic biomarkers scattered over literatures in PubMed. At present, CMBD contains 438 manually curated relationships between 282 biomarkers and 76 cancer subtypes of 18 tissues reported in 248 literatures. Users can access the comprehensive metabolic biomarker information about cancers, references, clinical samples and their relationships from our online database. As case studies, pathway analysis was performed on the metabolic biomarkers of breast and prostate cancers, respectively. 'Phenylalanine, tyrosine and tryptophan biosynthesis', 'phenylalanine metabolism' and 'primary bile acid biosynthesis' were identified as playing key roles in breast cancer. 'Glyoxylate and dicarboxylate metabolism', 'citrate cycle (TCA cycle)', and 'alanine, aspartate and glutamate metabolism' have important functions in prostate cancer. These findings provide us with an understanding of the metabolic pathway of cancer initiation and progression. Database URL: http://www.sysbio.org.cn/CMBD/.
Collapse
Affiliation(s)
- Jing Chen
- Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, Sichuan 610041, China.,The School of Science, Kangda College of Nanjing Medical University, Lianyungang, Jiangsu 222000, China
| | - Xingyun Liu
- Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, Sichuan 610041, China
| | - Li Shen
- Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, Sichuan 610041, China
| | - Yuxin Lin
- Department of Urology, The First Affiliated Hospital of Soochow University, Suzhou, Jiangsu 215006, China
| | - Bairong Shen
- Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, Sichuan 610041, China
| |
Collapse
|
35
|
iBLP: An XGBoost-Based Predictor for Identifying Bioluminescent Proteins. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2021; 2021:6664362. [PMID: 33505515 PMCID: PMC7808816 DOI: 10.1155/2021/6664362] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/21/2020] [Revised: 12/13/2020] [Accepted: 12/28/2020] [Indexed: 02/07/2023]
Abstract
Bioluminescent proteins (BLPs) are a class of proteins that widely distributed in many living organisms with various mechanisms of light emission including bioluminescence and chemiluminescence from luminous organisms. Bioluminescence has been commonly used in various analytical research methods of cellular processes, such as gene expression analysis, drug discovery, cellular imaging, and toxicity determination. However, the identification of bioluminescent proteins is challenging as they share poor sequence similarities among them. In this paper, we briefly reviewed the development of the computational identification of BLPs and subsequently proposed a novel predicting framework for identifying BLPs based on eXtreme gradient boosting algorithm (XGBoost) and using sequence-derived features. To train the models, we collected BLP data from bacteria, eukaryote, and archaea. Then, for getting more effective prediction models, we examined the performances of different feature extraction methods and their combinations as well as classification algorithms. Finally, based on the optimal model, a novel predictor named iBLP was constructed to identify BLPs. The robustness of iBLP has been proved by experiments on training and independent datasets. Comparison with other published method further demonstrated that the proposed method is powerful and could provide good performance for BLP identification. The webserver and software package for BLP identification are freely available at http://lin-group.cn/server/iBLP.
Collapse
|
36
|
Peng J, Lu G, Shang X. A Survey of Network Representation Learning Methods for Link Prediction in Biological Network. Curr Pharm Des 2021; 26:3076-3084. [PMID: 31951161 DOI: 10.2174/1381612826666200116145057] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2019] [Accepted: 01/09/2020] [Indexed: 11/22/2022]
Abstract
BACKGROUND Networks are powerful resources for describing complex systems. Link prediction is an important issue in network analysis and has important practical application value. Network representation learning has proven to be useful for network analysis, especially for link prediction tasks. OBJECTIVE To review the application of network representation learning on link prediction in a biological network, we summarize recent methods for link prediction in a biological network and discuss the application and significance of network representation learning in link prediction task. METHOD & RESULTS We first introduce the widely used link prediction algorithms, then briefly introduce the development of network representation learning methods, focusing on a few widely used methods, and their application in biological network link prediction. Existing studies demonstrate that using network representation learning to predict links in biological networks can achieve better performance. In the end, some possible future directions have been discussed.
Collapse
Affiliation(s)
- Jiajie Peng
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China
| | - Guilin Lu
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China
| |
Collapse
|
37
|
Gong Y, Chang C, Liu X, He Y, Wu Y, Wang S, Zhang C. Stimulator of Interferon Genes Signaling Pathway and its Role in Anti-tumor Immune Therapy. Curr Pharm Des 2021; 26:3085-3095. [PMID: 32520678 DOI: 10.2174/1381612826666200610183048] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2020] [Accepted: 05/04/2020] [Indexed: 12/19/2022]
Abstract
Stimulator of interferon genes is an important innate immune signaling molecule in the body and is involved in the innate immune signal transduction pathway induced by pathogen-associated molecular patterns or damage-associated molecular patterns. Stimulator of interferon genes promotes the production of type I interferon and thus plays an important role in the innate immune response to infection. In addition, according to a recent study, the stimulator of interferon genes pathway also contributes to anti-inflammatory and anti-tumor reactions. In this paper, current researches on the Stimulator of interferon genes signaling pathway and its relationship with tumor immunity are reviewed. Meanwhile, a series of critical problems to be addressed in subsequent studies are discussed as well.
Collapse
Affiliation(s)
- Yuanjin Gong
- Department of Pathology, Harbin Medical University, Harbin, China
| | - Chang Chang
- Department of Pathology, Harbin Medical University, Harbin, China
| | - Xi Liu
- Center of Cardiovascular Disease, Inner Mongolia People's Hospital, Hohhot, China
| | - Yan He
- Department of Pathology, Harbin Medical University, Harbin, China
| | - Yiqi Wu
- Department of Pathology, Harbin Medical University, Harbin, China
| | - Song Wang
- Department of Pathology, Harbin Medical University, Harbin, China
| | - Chongyou Zhang
- Basic Medical College, Harbin Medical University, Harbin, China
| |
Collapse
|
38
|
Hou R, Wu J, Xu L, Zou Q, Wu YJ. Computational Prediction of Protein Arginine Methylation Based on Composition-Transition-Distribution Features. ACS OMEGA 2020; 5:27470-27479. [PMID: 33134710 PMCID: PMC7594152 DOI: 10.1021/acsomega.0c03972] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/18/2020] [Accepted: 10/06/2020] [Indexed: 06/11/2023]
Abstract
Arginine methylation is one of the most essential protein post-translational modifications. Identifying the site of arginine methylation is a critical problem in biology research. Unfortunately, biological experiments such as mass spectrometry are expensive and time-consuming. Hence, predicting arginine methylation by machine learning is an alternative fast and efficient way. In this paper, we focus on the systematic characterization of arginine methylation with composition-transition-distribution (CTD) features. The presented framework consists of three stages. In the first stage, we extract CTD features from 1750 samples and exploit decision tree to generate accurate prediction. The accuracy of prediction can reach 96%. In the second stage, the support vector machine can predict the number of arginine methylation sites with 0.36 R-squared. In the third stage, experiments carried out with the updated arginine methylation site data set show that utilizing CTD features and adopting random forest as the classifier outperform previous methods. The accuracy of identification can reach 82.1 and 82.5% in single methylarginine and double methylarginine data sets, respectively. The discovery presented in this paper can be helpful for future research on arginine methylation.
Collapse
Affiliation(s)
- Ruiyan Hou
- Laboratory
of Molecular Toxicology, State Key Laboratory of Integrated Management
of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
- College
of Life Science, University of Chinese Academy
of Sciences, Beijing 100049, China
| | - Jin Wu
- School
of Management, Shenzhen Polytechnic, Shenzhen 518055, China
| | - Lei Xu
- School
of Electronic and Engineering, Shenzhen
Polytechnic, Shenzhen 518055, China
| | - Quan Zou
- Institute
of Fundamental and Frontier Sciences, University
of Electronic Science and Technology of China, Chengdu 610054, China
| | - Yi-Jun Wu
- Laboratory
of Molecular Toxicology, State Key Laboratory of Integrated Management
of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
| |
Collapse
|
39
|
Abstract
Background:
Thermophilic proteins can maintain good activity under high temperature,
therefore, it is important to study thermophilic proteins for the thermal stability of proteins.
Objective:
In order to solve the problem of low precision and low efficiency in predicting
thermophilic proteins, a prediction method based on feature fusion and machine learning was
proposed in this paper.
Methods:
For the selected thermophilic data sets, firstly, the thermophilic protein sequence was
characterized based on feature fusion by the combination of g-gap dipeptide, entropy density and
autocorrelation coefficient. Then, Kernel Principal Component Analysis (KPCA) was used to reduce
the dimension of the expressed protein sequence features in order to reduce the training time and
improve efficiency. Finally, the classification model was designed by using the classification
algorithm.
Results:
A variety of classification algorithms was used to train and test on the selected thermophilic
dataset. By comparison, the accuracy of the Support Vector Machine (SVM) under the jackknife
method was over 92%. The combination of other evaluation indicators also proved that the SVM
performance was the best.
Conclusion:
Because of choosing an effectively feature representation method and a robust
classifier, the proposed method is suitable for predicting thermophilic proteins and is superior to
most reported methods.
Collapse
Affiliation(s)
- Xian-Fang Wang
- School of Computer and Information Engineering, Henan Normal University, Henan, China
| | - Peng Gao
- School of Computer and Information Engineering, Henan Normal University, Henan, China
| | - Yi-Feng Liu
- School of Computer and Information Engineering, Henan Normal University, Henan, China
| | - Hong-Fei Li
- School of Computer and Information Engineering, Henan Normal University, Henan, China
| | - Fan Lu
- School of Computer and Information Engineering, Henan Normal University, Henan, China
| |
Collapse
|
40
|
Han Y, Cheng L, Sun W. Analysis of Protein-Protein Interaction Networks through Computational Approaches. Protein Pept Lett 2020; 27:265-278. [PMID: 31692419 DOI: 10.2174/0929866526666191105142034] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2019] [Revised: 05/08/2019] [Accepted: 09/26/2019] [Indexed: 01/02/2023]
Abstract
The interactions among proteins and genes are extremely important for cellular functions. Molecular interactions at protein or gene levels can be used to construct interaction networks in which the interacting species are categorized based on direct interactions or functional similarities. Compared with the limited experimental techniques, various computational tools make it possible to analyze, filter, and combine the interaction data to get comprehensive information about the biological pathways. By the efficient way of integrating experimental findings in discovering PPIs and computational techniques for prediction, the researchers have been able to gain many valuable data on PPIs, including some advanced databases. Moreover, many useful tools and visualization programs enable the researchers to establish, annotate, and analyze biological networks. We here review and list the computational methods, databases, and tools for protein-protein interaction prediction.
Collapse
Affiliation(s)
- Ying Han
- Cardiovascular Department, The Fourth Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Liang Cheng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Weiju Sun
- Cardiovascular Department, The First Affiliated Hospital of Harbin Medical University, Harbin, China
| |
Collapse
|
41
|
Zhan Q, Fu Y, Jiang Q, Liu B, Peng J, Wang Y. SpliVert: A Protein Multiple Sequence Alignment Refinement Method Based on Splitting-Splicing Vertically. Protein Pept Lett 2020; 27:295-302. [PMID: 31385760 DOI: 10.2174/0929866526666190806143959] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2019] [Revised: 04/26/2019] [Accepted: 06/14/2019] [Indexed: 11/22/2022]
Abstract
BACKGROUND Multiple Sequence Alignment (MSA) is a fundamental task in bioinformatics and is required for many biological analysis tasks. The more accurate the alignments are, the more credible the downstream analyses. Most protein MSA algorithms realign an alignment to refine it by dividing it into two groups horizontally and then realign the two groups. However, this strategy does not consider that different regions of the sequences have different conservation; this property may lead to incorrect residue-residue or residue-gap pairs, which cannot be corrected by this strategy. OBJECTIVE In this article, our motivation is to develop a novel refinement method based on splitting- splicing vertically. METHODS Here, we present a novel refinement method based on splitting-splicing vertically, called SpliVert. For an alignment, we split it vertically into 3 parts, remove the gap characters in the middle, realign the middle part alone, and splice the realigned middle parts with the other two initial pieces to obtain a refined alignment. In the realign procedure of our method, the aligner will only focus on a certain part, ignoring the disturbance of the other parts, which could help fix the incorrect pairs. RESULTS We tested our refinement strategy for 2 leading MSA tools on 3 standard benchmarks, according to the commonly used average SP (and TC) score. The results show that given appropriate proportions to split the initial alignment, the average scores are increased comparably or slightly after using our method. We also compared the alignments refined by our method with alignments directly refined by the original alignment tools. The results suggest that using our SpliVert method to refine alignments can also outperform direct use of the original alignment tools. CONCLUSION The results reveal that splitting vertically and realigning part of the alignment is a good strategy for the refinement of protein multiple sequence alignments.
Collapse
Affiliation(s)
- Qing Zhan
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Yilei Fu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Qinghua Jiang
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Bo Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Jiajie Peng
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China
| | - Yadong Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| |
Collapse
|
42
|
Lei X, Tie J, Fujita H. Relational completion based non-negative matrix factorization for predicting metabolite-disease associations. Knowl Based Syst 2020. [DOI: 10.1016/j.knosys.2020.106238] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
|
43
|
Zhuang H, Zhang Y, Yang S, Cheng L, Liu SL. A Mendelian Randomization Study on Infant Length and Type 2 Diabetes Mellitus Risk. Curr Gene Ther 2020; 19:224-231. [PMID: 31553296 DOI: 10.2174/1566523219666190925115535] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2019] [Revised: 06/15/2019] [Accepted: 06/16/2019] [Indexed: 12/12/2022]
Abstract
OBJECTIVE Infant length (IL) is a positively associated phenotype of type 2 diabetes mellitus (T2DM), but the causal relationship of which is still unclear. Here, we applied a Mendelian randomization (MR) study to explore the causal relationship between IL and T2DM, which has the potential to provide guidance for assessing T2DM activity and T2DM- prevention in young at-risk populations. MATERIALS AND METHODS To classify the study, a two-sample MR, using genetic instrumental variables (IVs) to explore the causal effect was applied to test the influence of IL on the risk of T2DM. In this study, MR was carried out on GWAS data using 8 independent IL SNPs as IVs. The pooled odds ratio (OR) of these SNPs was calculated by the inverse-variance weighted method for the assessment of the risk the shorter IL brings to T2DM. Sensitivity validation was conducted to identify the effect of individual SNPs. MR-Egger regression was used to detect pleiotropic bias of IVs. RESULTS The pooled odds ratio from the IVW method was 1.03 (95% CI 0.89-1.18, P = 0.0785), low intercept was -0.477, P = 0.252, and small fluctuation of ORs ranged from -0.062 ((0.966 - 1.03) / 1.03) to 0.05 ((1.081 - 1.03) / 1.03) in leave-one-out validation. CONCLUSION We validated that the shorter IL causes no additional risk to T2DM. The sensitivity analysis and the MR-Egger regression analysis also provided adequate evidence that the above result was not due to any heterogeneity or pleiotropic effect of IVs.
Collapse
Affiliation(s)
- He Zhuang
- Systemomics Center, College of Pharmacy, and Genomics Research Center (State-Province Key Laboratories of Biomedicine- Pharmaceutics of China), Harbin Medical University, Harbin, China.,HMU-UCFM Centre for Infection and Genomics, Harbin Medical University, Harbin, China
| | - Ying Zhang
- Department of Pharmacy, Heilongjiang Province Land Reclamation Headquarters General Hospital, 150001, Harbin, China
| | - Shuo Yang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Liang Cheng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Shu-Lin Liu
- Systemomics Center, College of Pharmacy, and Genomics Research Center (State-Province Key Laboratories of Biomedicine- Pharmaceutics of China), Harbin Medical University, Harbin, China.,HMU-UCFM Centre for Infection and Genomics, Harbin Medical University, Harbin, China.,Department of Microbiology, Immunology and Infectious Diseases, University of Calgary, Calgary, Canada.,Department of Infectious Diseases, The First Affiliated Hospital, Harbin Medical University, Harbin, China.,Translational Medicine Research and Cooperation Center of Northern China, Heilongjiang Academy of Medical Sciences, Harbin, China
| |
Collapse
|
44
|
Li Q, Zhou W, Wang D, Wang S, Li Q. Prediction of Anticancer Peptides Using a Low-Dimensional Feature Model. Front Bioeng Biotechnol 2020; 8:892. [PMID: 32903381 PMCID: PMC7434836 DOI: 10.3389/fbioe.2020.00892] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2020] [Accepted: 07/10/2020] [Indexed: 01/09/2023] Open
Abstract
Cancer is still a severe health problem globally. The therapy of cancer traditionally involves the use of radiotherapy or anticancer drugs to kill cancer cells, but these methods are quite expensive and have side effects, which will cause great harm to patients. With the find of anticancer peptides (ACPs), significant progress has been achieved in the therapy of tumors. Therefore, it is invaluable to accurately identify anticancer peptides. Although biochemical experiments can solve this work, this method is expensive and time-consuming. To promote the application of anticancer peptides in cancer therapy, machine learning can be used to recognize anticancer peptides by extracting the feature vectors of anticancer peptides. Nevertheless, poor performance usually be found in training the machine learning model to utilizing high-dimensional features in practice. In order to solve the above job, this paper put forward a 19-dimensional feature model based on anticancer peptide sequences, which has lower dimensionality and better performance than some existing methods. In addition, this paper also separated a model with a low number of dimensions and acceptable performance. The few features identified in this study may represent the important features of anticancer peptides.
Collapse
Affiliation(s)
- Qingwen Li
- College of Animal Science and Technology, Northeast Agricultural University, Harbin, China
| | - Wenyang Zhou
- Center for Bioinformatics, School of Life Sciences and Technology, Harbin Institute of Technology, Harbin, China
| | - Donghua Wang
- Department of General Surgery, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin, China
| | - Sui Wang
- Key Laboratory of Soybean Biology in Chinese Ministry of Education, Northeast Agricultural University, Harbin, China
- State Key Laboratory of Tree Genetics and Breeding, Northeast Forestry University, Harbin, China
| | - Qingyuan Li
- Forestry and Fruit Tree Research Institute, Wuhan Academy of Agricultural Sciences, Wuhan, China
| |
Collapse
|
45
|
Liu Z, Zhang Y, Han X, Li C, Yang X, Gao J, Xie G, Du N. Identifying Cancer-Related lncRNAs Based on a Convolutional Neural Network. Front Cell Dev Biol 2020; 8:637. [PMID: 32850792 PMCID: PMC7432192 DOI: 10.3389/fcell.2020.00637] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2020] [Accepted: 06/24/2020] [Indexed: 12/15/2022] Open
Abstract
Millions of people are suffering from cancers, but accurate early diagnosis and effective treatment are still tough for all doctors. In recent years, long non-coding RNAs (lncRNAs) have been proven to play an important role in diseases, especially cancers. These lncRNAs execute their functions by regulating gene expression. Therefore, identifying lncRNAs which are related to cancers could help researchers gain a deeper understanding of cancer mechanisms and help them find treatment options. A large number of relationships between lncRNAs and cancers have been verified by biological experiments, which give us a chance to use computational methods to identify cancer-related lncRNAs. In this paper, we applied the convolutional neural network (CNN) to identify cancer-related lncRNAs by lncRNA's target genes and their tissue expression specificity. Since lncRNA regulates target gene expression and it has been reported to have tissue expression specificity, their target genes and expression in different tissues were used as features of lncRNAs. Then, the deep belief network (DBN) was used to unsupervised encode features of lncRNAs. Finally, CNN was used to predict cancer-related lncRNAs based on known relationships between lncRNAs and cancers. For each type of cancer, we built a CNN model to predict its related lncRNAs. We identified more related lncRNAs for 41 kinds of cancers. Ten-cross validation has been used to prove the performance of our method. The results showed that our method is better than several previous methods with area under the curve (AUC) 0.81 and area under the precision–recall curve (AUPR) 0.79. To verify the accuracy of our results, case studies have been done.
Collapse
Affiliation(s)
- Zihao Liu
- Department of Oncology, Medical School of Chinese PLA, Chinese PLA General Hospital, Beijing, China.,Department of Oncology, The Fourth Medical Center, Chinese PLA General Hospital, Beijing, China
| | - Ying Zhang
- Department of Pharmacy, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin, China
| | - Xudong Han
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Chenxi Li
- Department of Oncology, The Fourth Medical Center, Chinese PLA General Hospital, Beijing, China
| | - Xuhui Yang
- Department of Oncology, Medical School of Chinese PLA, Chinese PLA General Hospital, Beijing, China
| | - Jie Gao
- Department of Oncology, The Fourth Medical Center, Chinese PLA General Hospital, Beijing, China
| | - Ganfeng Xie
- Department of Oncology, Southwest Hospital, Army Medical University, Chongqing, China
| | - Nan Du
- Department of Oncology, Medical School of Chinese PLA, Chinese PLA General Hospital, Beijing, China.,Department of Oncology, The Fourth Medical Center, Chinese PLA General Hospital, Beijing, China
| |
Collapse
|
46
|
Guan ZX, Li SH, Zhang ZM, Zhang D, Yang H, Ding H. A Brief Survey for MicroRNA Precursor Identification Using Machine Learning Methods. Curr Genomics 2020; 21:11-25. [PMID: 32655294 PMCID: PMC7324890 DOI: 10.2174/1389202921666200214125102] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2019] [Revised: 01/24/2020] [Accepted: 01/30/2020] [Indexed: 11/22/2022] Open
Abstract
MicroRNAs, a group of short non-coding RNA molecules, could regulate gene expression. Many diseases are associated with abnormal expression of miRNAs. Therefore, accurate identification of miRNA precursors is necessary. In the past 10 years, experimental methods, comparative genomics methods, and artificial intelligence methods have been used to identify pre-miRNAs. However, experimental methods and comparative genomics methods have their disadvantages, such as time-consuming. In contrast, machine learning-based method is a better choice. Therefore, the review summarizes the current advances in pre-miRNA recognition based on computational methods, including the construction of benchmark datasets, feature extraction methods, prediction algorithms, and the results of the models. And we also provide valid information about the predictors currently available. Finally, we give the future perspectives on the identification of pre-miRNAs. The review provides scholars with a whole background of pre-miRNA identification by using machine learning methods, which can help researchers have a clear understanding of progress of the research in this field.
Collapse
Affiliation(s)
- Zheng-Xing Guan
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu610054, China
| | - Shi-Hao Li
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu610054, China
| | - Zi-Mei Zhang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu610054, China
| | - Dan Zhang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu610054, China
| | - Hui Yang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu610054, China
| | - Hui Ding
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu610054, China
| |
Collapse
|
47
|
Geng G, Zhang Z, Cheng L. Identification of a Multi-Long Noncoding RNA Signature for the Diagnosis of Type 1 Diabetes Mellitus. Front Bioeng Biotechnol 2020; 8:553. [PMID: 32719778 PMCID: PMC7350420 DOI: 10.3389/fbioe.2020.00553] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2020] [Accepted: 05/07/2020] [Indexed: 02/01/2023] Open
Abstract
Due to the increasing prevalence of type 1 diabetes mellitus (T1DM) and its complications, there is an urgent need to identify novel methods for predicting the occurrence and understanding the pathogenetic mechanisms of the disease. Accumulated data have demonstrated the potential of long noncoding RNAs (lncRNAs), as biomarkers in establishing diagnosis and predicting prognosis of numerous diseases. Yet, little is known about the expression patterns and regulatory roles of lncRNAs in the pathogenesis of T1DM and whether they can be used as diagnostic biomarkers for the disease. To further explore these questions, in the present study, we conducted a comparative analysis of the expression patterns of lncRNAs between 20 T1DM patients and 42 health controls by retrospectively analyzing a published microarray data set. Our results indicate that, compared with healthy controls, diabetic patients had altered levels of lncRNAs. Then, we used three time cross-validation strategy and support vector machine to propose a specific 26-lncRNA signature (termed 26LncSigT1DM). This 26LncSigT1DM signature can be used to effectively distinguish between healthy and diabetic individuals (area under the curve = 0.825) of a validation cohort. After the 26LncSigT1DM was prospectively validated, we used Pearson correlation to identify 915 mRNAs, whose expression levels were positively correlated with those of the 26 lncRNAs. According to their Gene Ontology annotations, these mRNAs participate in processes including cellular response to stimulus, cell communication, multicellular organismal process, and cell motility. Kyoto Encyclopedia of Genes and Genomes analysis demonstrated that the genes encoding the 915 mRNAs may be associated with the NOD-like receptor signaling pathway, transforming growth factor β signaling pathway, and mineral absorption, suggesting that the deregulation of these lncRNAs may mediate inflammatory abnormalities and immune dysfunctions, which jointly promote the pathogenesis of T1DM. Thus, our study identifies a novel diagnostic tool and may shed more light on the molecular mechanisms underlying the pathogenesis of T1DM.
Collapse
Affiliation(s)
- Guannan Geng
- Department of Endocrinology, The First Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Zicheng Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Liang Cheng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| |
Collapse
|
48
|
Identification of Human Enzymes Using Amino Acid Composition and the Composition of k-Spaced Amino Acid Pairs. BIOMED RESEARCH INTERNATIONAL 2020; 2020:9235920. [PMID: 32596396 PMCID: PMC7273372 DOI: 10.1155/2020/9235920] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/14/2020] [Accepted: 04/22/2020] [Indexed: 11/17/2022]
Abstract
Enzymes are proteins that can efficiently catalyze specific biochemical reactions, and they are widely present in the human body. Developing an efficient method to identify human enzymes is vital to select enzymes from the vast number of human proteins and to investigate their functions. Nevertheless, only a limited amount of research has been conducted on the classification of human enzymes and nonenzymes. In this work, we developed a support vector machine- (SVM-) based predictor to classify human enzymes using the amino acid composition (AAC), the composition of k-spaced amino acid pairs (CKSAAP), and selected informative amino acid pairs through the use of a feature selection technique. A training dataset including 1117 human enzymes and 2099 nonenzymes and a test dataset including 684 human enzymes and 1270 nonenzymes were constructed to train and test the proposed model. The results of jackknife cross-validation showed that the overall accuracy was 76.46% for the training set and 76.21% for the test set, which are higher than the 72.6% achieved in previous research. Furthermore, various feature extraction methods and mainstream classifiers were compared in this task, and informative feature parameters of k-spaced amino acid pairs were selected and compared. The results suggest that our classifier can be used in human enzyme identification effectively and efficiently and can help to understand their functions and develop new drugs.
Collapse
|
49
|
Deng S, Sun Y, Zhao T, Hu Y, Zang T. A Review of Drug Side Effect Identification Methods. Curr Pharm Des 2020; 26:3096-3104. [PMID: 32532187 DOI: 10.2174/1381612826666200612163819] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2020] [Accepted: 05/18/2020] [Indexed: 11/22/2022]
Abstract
Drug side effects have become an important indicator for evaluating the safety of drugs. There are two main factors in the frequent occurrence of drug safety problems; on the one hand, the clinical understanding of drug side effects is insufficient, leading to frequent adverse drug reactions, while on the other hand, due to the long-term period and complexity of clinical trials, side effects of approved drugs on the market cannot be reported in a timely manner. Therefore, many researchers have focused on developing methods to identify drug side effects. In this review, we summarize the methods of identifying drug side effects and common databases in this field. We classified methods of identifying side effects into four categories: biological experimental, machine learning, text mining and network methods. We point out the key points of each kind of method. In addition, we also explain the advantages and disadvantages of each method. Finally, we propose future research directions.
Collapse
Affiliation(s)
- Shuai Deng
- College of Science, Beijing Forestry University, Beijing, China
| | - Yige Sun
- Microbiology Department, Harbin Medical University, Harbin, 150081, China
| | - Tianyi Zhao
- School of Life Science and Technology, Department of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Yang Hu
- School of Life Science and Technology, Department of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Tianyi Zang
- School of Life Science and Technology, Department of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| |
Collapse
|
50
|
Meng C, Guo F, Zou Q. CWLy-SVM: A support vector machine-based tool for identifying cell wall lytic enzymes. Comput Biol Chem 2020; 87:107304. [PMID: 32580129 DOI: 10.1016/j.compbiolchem.2020.107304] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2019] [Revised: 06/07/2020] [Accepted: 06/08/2020] [Indexed: 12/21/2022]
Abstract
Cell wall lytic enzymes, as an important biotechnical tool in drug development, agriculture and the food industry, have attracted more research attention. In this research, the accurate identification of cell wall lytic enzymes is one of the key and fundamental tasks. In this study, in order to eliminate the inefficiency of in vitro experiments, a support vector machine-based cell wall lytic enzyme identification model was constructed using bioinformatics. This machine learning process includes feature extraction, feature selection, model training and optimization. According to the jackknife cross validation test, this model obtained a sensitivity of 0.853, a specificity of 0.977, an MCC of 0.845 and an AUC of 0.915. These benchmark results demonstrate that the proposed model outperforms the state-of-the-art method and that it has powerful cell wall lytic enzyme identification ability. Furthermore, we comprehensively analyzed the selected optimal features and used the proposed model to construct a user friendly web server called the CWLy-SVM to identify cell wall lytic enzymes, which is available at http://server.malab.cn/CWLy-SVM/index.jsp.
Collapse
Affiliation(s)
- Chaolu Meng
- College of Intelligence and Computing, Tianjin University, Tianjin, China; College of Computer and Information Engineering, Inner Mongolia Agricultural University, Hohhot, China
| | - Fei Guo
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China; Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China.
| |
Collapse
|