1
|
Chai L, Gao J, Li Z, Sun H, Liu J, Wang Y, Zhang L. Predicting CTCF cell type active binding sites in human genome. Sci Rep 2024; 14:31744. [PMID: 39738353 PMCID: PMC11686126 DOI: 10.1038/s41598-024-82238-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2024] [Accepted: 12/03/2024] [Indexed: 01/02/2025] Open
Abstract
The CCCTC-binding factor (CTCF) is pivotal in orchestrating diverse biological functions across the human genome, yet the mechanisms driving its cell type-active DNA binding affinity remain underexplored. Here, we collected ChIP-seq data from 67 cell lines in ENCODE, constructed a unique dataset of cell type-active CTCF binding sites (CBS), and trained convolutional neural networks (CNN) to dissect the patterns of CTCF binding activity. Our analysis reveals that transcription factors RAD21/SMC3 and chromatin accessibility are more predictive compared to sequence motifs and histone modifications. Integrating them together achieved AUPRC values consistently above 0.868, highlighting their utility in deciphering CTCF transcription factor binding dynamics. This study provides a deeper understanding of the regulatory functions of CTCF via machine learning framework.
Collapse
Affiliation(s)
- Lu Chai
- School of Physical Science and Technology, Inner Mongolia University, Hohhot, 010021, People's Republic of China
| | - Jie Gao
- School of Physical Science and Technology, Inner Mongolia University, Hohhot, 010021, People's Republic of China
| | - Zihan Li
- School of Physical Science and Technology, Inner Mongolia University, Hohhot, 010021, People's Republic of China
| | - Hao Sun
- School of Physical Science and Technology, Inner Mongolia University, Hohhot, 010021, People's Republic of China
| | - Junjie Liu
- School of Physical Science and Technology, Inner Mongolia University, Hohhot, 010021, People's Republic of China
| | - Yong Wang
- CEMS, NCMIS, HCMS, MDIS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, 100190, People's Republic of China.
| | - Lirong Zhang
- School of Physical Science and Technology, Inner Mongolia University, Hohhot, 010021, People's Republic of China.
| |
Collapse
|
2
|
Liu L, Han L, Han K, Zhang Z, Zhang H, Zhang L. Identification of co-localised transcription factors based on paired motifs analysis. IET Syst Biol 2024; 18:238-249. [PMID: 39588827 DOI: 10.1049/syb2.12104] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2024] [Revised: 10/02/2024] [Accepted: 10/24/2024] [Indexed: 11/27/2024] Open
Abstract
The interaction of transcription factors (TFs) with DNA precisely regulates gene transcription. In mammalian cells, thousands of TFs often interact with DNA cis-regulatory elements in a combinatorial manner rather than act alone. The identification of cooperativity between TFs can help to explore the mechanism of transcriptional regulation. However, little is known about the cooperative patterns of TFs in the genome. To identify which TFs prefer co-localisation, the authors conducted a paired motif analysis in the accessible regions of the human genome based on the Poisson background model. Especially, the authors distinguish the cooperative binding TFs and the competitive binding TFs according to the distance between TF motifs. In the K562 cell line, the authors find that TFs from a same family are always competing the same binding sites, such as FOS_JUN family, whereas KLF family TFs show significant cooperative binding in the adjacency region. Furthermore, the comparative analysis across 16 human cell lines indicates that most TF combination patterns are conserved, but there are still some cell-line-specific patterns. Finally, in human prostate cancer cells (PC-3) and human prostate normal cells (RWPE-2), the authors investigate the specific TF combination patterns in the disease cell and normal cell. The results show that the cooperative binding TF pairs shared by PC-3 and RWPE-2 account for over 90%. Simultaneously, the authors also identify 26 specific TF combination pairs in PC-3 cancer cells.
Collapse
Affiliation(s)
- Li Liu
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Lu Han
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
- School of Physical Science and Technology, Inner Mongolia University, Hohhot, China
| | - Kaiyuan Han
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Zheng Zhang
- Computer Science and Information Systems, Murray State University, Murray, USA
| | - Haojiang Zhang
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Lirong Zhang
- School of Physical Science and Technology, Inner Mongolia University, Hohhot, China
| |
Collapse
|
3
|
Ren J, Guo Z, Qi Y, Zhang Z, Liu L. Prediction of YY1 loop anchor based on multi-omics features. Methods 2024; 232:96-106. [PMID: 39521361 DOI: 10.1016/j.ymeth.2024.11.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2024] [Revised: 10/22/2024] [Accepted: 11/06/2024] [Indexed: 11/16/2024] Open
Abstract
The three-dimensional structure of chromatin is crucial for the regulation of gene expression. YY1 promotes enhancer-promoter interactions in a manner analogous to CTCF-mediated chromatin interactions. However, little is known about which YY1 binding sites can form loop anchors. In this study, the LightGBM model was used to predict YY1-loop anchors by integrating multi-omics data. Due to the large imbalance in the number of positive and negative samples, we use AUPRC to reflect the quality of the classifier. The results show that the LightGBM model exhibits strong predictive performance (AUPRC≥0.93). To verify the robustness of the model, the dataset was divided into training and test sets at a 4:1 ratio. The results show that the model performs well for YY1-loop anchor prediction on both the training and independent test sets. Additionally, we ranked the importance of the features and found that the formation of YY1-loop anchors is primarily influenced by the co-binding of transcription factors CTCF, SMC3, and RAD21, as well as histone modifications and sequence context.
Collapse
Affiliation(s)
- Jun Ren
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China; School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Zhiling Guo
- Beidahuang Industry Group General Hospital, Harbin, China
| | - Yixuan Qi
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China; School of Mathematics and Statistics, Hainan Normal University, Haikou, China; School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Zheng Zhang
- Computer Science and Information Systems, Murray State University, Murray, USA
| | - Li Liu
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China.
| |
Collapse
|
4
|
Yao L, Xie P, Guan J, Chung CR, Huang Y, Pang Y, Wu H, Chiang YC, Lee TY. CapsEnhancer: An Effective Computational Framework for Identifying Enhancers Based on Chaos Game Representation and Capsule Network. J Chem Inf Model 2024; 64:5725-5736. [PMID: 38946113 PMCID: PMC11267569 DOI: 10.1021/acs.jcim.4c00546] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2024] [Revised: 06/21/2024] [Accepted: 06/21/2024] [Indexed: 07/02/2024]
Abstract
Enhancers are a class of noncoding DNA, serving as crucial regulatory elements in governing gene expression by binding to transcription factors. The identification of enhancers holds paramount importance in the field of biology. However, traditional experimental methods for enhancer identification demand substantial human and material resources. Consequently, there is a growing interest in employing computational methods for enhancer prediction. In this study, we propose a two-stage framework based on deep learning, termed CapsEnhancer, for the identification of enhancers and their strengths. CapsEnhancer utilizes chaos game representation to encode DNA sequences into unique images and employs a capsule network to extract local and global features from sequence "images". Experimental results demonstrate that CapsEnhancer achieves state-of-the-art performance in both stages. In the first and second stages, the accuracy surpasses the previous best methods by 8 and 3.5%, reaching accuracies of 94.5 and 95%, respectively. Notably, this study represents the pioneering application of computer vision methods to enhancer identification tasks. Our work not only contributes novel insights to enhancer identification but also provides a fresh perspective for other biological sequence analysis tasks.
Collapse
Affiliation(s)
- Lantian Yao
- Kobilka
Institute of Innovative Drug Discovery, School of Medicine, The Chinese University of Hong Kong, Shenzhen 518172, China
- School
of Science and Engineering, The Chinese
University of Hong Kong, Shenzhen 518172, China
| | - Peilin Xie
- Kobilka
Institute of Innovative Drug Discovery, School of Medicine, The Chinese University of Hong Kong, Shenzhen 518172, China
| | - Jiahui Guan
- School
of Medicine, The Chinese University of Hong
Kong, Shenzhen 518172, China
| | - Chia-Ru Chung
- Department
of Computer Science and Information Engineering, National Central University, Taoyuan 320317, Taiwan
| | - Yixian Huang
- School
of Medicine, The Chinese University of Hong
Kong, Shenzhen 518172, China
| | - Yuxuan Pang
- Division
of Health Medical Intelligence, Human Genome Center, The Institute
of Medical Science, The University of Tokyo, Tokyo 108-8639, Japan
| | - Huacong Wu
- School
of Medicine, The Chinese University of Hong
Kong, Shenzhen 518172, China
| | - Ying-Chih Chiang
- Kobilka
Institute of Innovative Drug Discovery, School of Medicine, The Chinese University of Hong Kong, Shenzhen 518172, China
- School
of Medicine, The Chinese University of Hong
Kong, Shenzhen 518172, China
| | - Tzong-Yi Lee
- Institute
of Bioinformatics and Systems Biology, National
Yang Ming Chiao Tung University, Hsinchu 300093, Taiwan
- Center
for Intelligent Drug Systems and Smart Bio-devices (IDS2B), National Yang Ming Chiao Tung University, Hsinchu 300093, Taiwan
| |
Collapse
|
5
|
Shen J, Sun H, Zhou S, Wang L, Dong C, Ren K, Du Q, Cao J, Wang Y, Sun J. Development of a screening system of gene sets for estimating the time of early skeletal muscle injury based on second-generation sequencing technology. Int J Legal Med 2024; 138:1629-1644. [PMID: 38532207 DOI: 10.1007/s00414-024-03210-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Accepted: 03/13/2024] [Indexed: 03/28/2024]
Abstract
The present study is aimed to address the challenge of wound age estimation in forensic science by identifying reliable genetic markers using low-cost and high-precision second-generation sequencing technology. A total of 54 Sprague-Dawley rats were randomly assigned to a control group or injury groups, with injury groups being further divided into time points (4 h, 8 h, 12 h, 16 h, 20 h, 24 h, 28 h, and 32 h after injury, n = 6) to establish rat skeletal muscle contusion models. Gene expression data were obtained using second-generation sequencing technology, and differential gene expression analysis, weighted gene co-expression network analysis (WGCNA) and time-dependent expression trend analysis were performed. A total of six sets of biomarkers were obtained: differentially expressed genes at adjacent time points (127 genes), co-expressed genes most associated with wound age (213 genes), hub genes exhibiting time-dependent expression (264 genes), and sets of transcription factors (TF) corresponding to the above sets of genes (74, 87, and 99 genes, respectively). Then, random forest (RF), support vector machine (SVM) and multilayer perceptron (MLP), were constructed for wound age estimation from the above gene sets. The results estimated by transcription factors were all superior to the corresponding hub genes, with the transcription factor group of WGCNA performed the best, with average accuracy rates of 96% for three models' internal testing, and 91.7% for the highest external validation. This study demonstrates the advantages of the indicator screening system based on second-generation sequencing technology and transcription factor level for wound age estimation.
Collapse
Affiliation(s)
- Junyi Shen
- Department of Forensic Medicine, Shanxi Medical University, Jinzhong, China
- Institute of Forensic Science Public Security Department of Shanxi, Taiyuan, China
| | - Hao Sun
- Department of Forensic Medicine, Shanxi Medical University, Jinzhong, China
| | - Shidong Zhou
- Department of Forensic Medicine, Shanxi Medical University, Jinzhong, China
| | - Liangliang Wang
- Department of Forensic Medicine, Shanxi Medical University, Jinzhong, China
| | - Chaoxiu Dong
- Institute of Forensic Science Public Security Department of Shanxi, Taiyuan, China
| | - Kang Ren
- Department of Forensic Medicine, Shanxi Medical University, Jinzhong, China
| | - Qiuxiang Du
- Department of Forensic Medicine, Shanxi Medical University, Jinzhong, China
| | - Jie Cao
- Department of Forensic Medicine, Shanxi Medical University, Jinzhong, China
| | - Yingyuan Wang
- Department of Forensic Medicine, Shanxi Medical University, Jinzhong, China.
| | - Junhong Sun
- Department of Forensic Medicine, Shanxi Medical University, Jinzhong, China.
| |
Collapse
|
6
|
Hwang H, Jeon H, Yeo N, Baek D. Big data and deep learning for RNA biology. Exp Mol Med 2024; 56:1293-1321. [PMID: 38871816 PMCID: PMC11263376 DOI: 10.1038/s12276-024-01243-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 02/27/2024] [Accepted: 03/05/2024] [Indexed: 06/15/2024] Open
Abstract
The exponential growth of big data in RNA biology (RB) has led to the development of deep learning (DL) models that have driven crucial discoveries. As constantly evidenced by DL studies in other fields, the successful implementation of DL in RB depends heavily on the effective utilization of large-scale datasets from public databases. In achieving this goal, data encoding methods, learning algorithms, and techniques that align well with biological domain knowledge have played pivotal roles. In this review, we provide guiding principles for applying these DL concepts to various problems in RB by demonstrating successful examples and associated methodologies. We also discuss the remaining challenges in developing DL models for RB and suggest strategies to overcome these challenges. Overall, this review aims to illuminate the compelling potential of DL for RB and ways to apply this powerful technology to investigate the intriguing biology of RNA more effectively.
Collapse
Affiliation(s)
- Hyeonseo Hwang
- School of Biological Sciences, Seoul National University, Seoul, Republic of Korea
| | - Hyeonseong Jeon
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea
- Genome4me Inc., Seoul, Republic of Korea
| | - Nagyeong Yeo
- School of Biological Sciences, Seoul National University, Seoul, Republic of Korea
| | - Daehyun Baek
- School of Biological Sciences, Seoul National University, Seoul, Republic of Korea.
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea.
- Genome4me Inc., Seoul, Republic of Korea.
| |
Collapse
|
7
|
Zhu W, Yuan SS, Li J, Huang CB, Lin H, Liao B. A First Computational Frame for Recognizing Heparin-Binding Protein. Diagnostics (Basel) 2023; 13:2465. [PMID: 37510209 PMCID: PMC10377868 DOI: 10.3390/diagnostics13142465] [Citation(s) in RCA: 44] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2023] [Revised: 07/13/2023] [Accepted: 07/21/2023] [Indexed: 07/30/2023] Open
Abstract
Heparin-binding protein (HBP) is a cationic antibacterial protein derived from multinuclear neutrophils and an important biomarker of infectious diseases. The correct identification of HBP is of great significance to the study of infectious diseases. This work provides the first HBP recognition framework based on machine learning to accurately identify HBP. By using four sequence descriptors, HBP and non-HBP samples were represented by discrete numbers. By inputting these features into a support vector machine (SVM) and random forest (RF) algorithm and comparing the prediction performances of these methods on training data and independent test data, it is found that the SVM-based classifier has the greatest potential to identify HBP. The model could produce an auROC of 0.981 ± 0.028 on training data using 10-fold cross-validation and an overall accuracy of 95.0% on independent test data. As the first model for HBP recognition, it will provide some help for infectious diseases and stimulate further research in related fields.
Collapse
Affiliation(s)
- Wen Zhu
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou 571158, China
- Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou 571158, China
- School of Mathematics and Statistics, Hainan Normal University, Haikou 571158, China
| | - Shi-Shi Yuan
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 611731, China
| | - Jian Li
- School of Basic Medical Sciences, Chengdu University, Chengdu 610106, China
| | - Cheng-Bing Huang
- School of Computer Science and Technology, ABa Teachers University, Chengdu 623002, China
| | - Hao Lin
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 611731, China
| | - Bo Liao
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou 571158, China
- Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou 571158, China
- School of Mathematics and Statistics, Hainan Normal University, Haikou 571158, China
| |
Collapse
|
8
|
Phan LT, Oh C, He T, Manavalan B. A comprehensive revisit of the machine-learning tools developed for the identification of enhancers in the human genome. Proteomics 2023; 23:e2200409. [PMID: 37021401 DOI: 10.1002/pmic.202200409] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Revised: 03/18/2023] [Accepted: 03/27/2023] [Indexed: 04/07/2023]
Abstract
Enhancers are non-coding DNA elements that play a crucial role in enhancing the transcription rate of a specific gene in the genome. Experiments for identifying enhancers can be restricted by their conditions and involve complicated, time-consuming, laborious, and costly steps. To overcome these challenges, computational platforms have been developed to complement experimental methods that enable high-throughput identification of enhancers. Over the last few years, the development of various enhancer computational tools has resulted in significant progress in predicting putative enhancers. Thus, researchers are now able to use a variety of strategies to enhance and advance enhancer study. In this review, an overview of machine learning (ML)-based prediction methods for enhancer identification and related databases has been provided. The existing enhancer-prediction methods have also been reviewed regarding their algorithms, feature selection processes, validation techniques, and software utility. In addition, the advantages and drawbacks of these ML approaches and guidelines for developing bioinformatic tools have been highlighted for a more efficient enhancer prediction. This review will serve as a useful resource for experimentalists in selecting the appropriate ML tool for their study, and for bioinformaticians in developing more accurate and advanced ML-based predictors.
Collapse
Affiliation(s)
- Le Thi Phan
- Computational Biology and Bioinformatics Laboratory, Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, Gyeonggi-do, South Korea
| | - Changmin Oh
- Computational Biology and Bioinformatics Laboratory, Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, Gyeonggi-do, South Korea
| | - Tao He
- Beidahuang Industry Group General Hospital, Harbin, China
| | - Balachandran Manavalan
- Computational Biology and Bioinformatics Laboratory, Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, Gyeonggi-do, South Korea
| |
Collapse
|
9
|
Zhang YF, Wang YH, Gu ZF, Pan XR, Li J, Ding H, Zhang Y, Deng KJ. Bitter-RF: A random forest machine model for recognizing bitter peptides. Front Med (Lausanne) 2023; 10:1052923. [PMID: 36778738 PMCID: PMC9909039 DOI: 10.3389/fmed.2023.1052923] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2022] [Accepted: 01/05/2023] [Indexed: 01/27/2023] Open
Abstract
Introduction Bitter peptides are short peptides with potential medical applications. The huge potential behind its bitter taste remains to be tapped. To better explore the value of bitter peptides in practice, we need a more effective classification method for identifying bitter peptides. Methods In this study, we developed a Random forest (RF)-based model, called Bitter-RF, using sequence information of the bitter peptide. Bitter-RF covers more comprehensive and extensive information by integrating 10 features extracted from the bitter peptides and achieves better results than the latest generation model on independent validation set. Results The proposed model can improve the accurate classification of bitter peptides (AUROC = 0.98 on independent set test) and enrich the practical application of RF method in protein classification tasks which has not been used to build a prediction model for bitter peptides. Discussion We hope the Bitter-RF could provide more conveniences to scholars for bitter peptide research.
Collapse
Affiliation(s)
- Yu-Fei Zhang
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Yu-Hao Wang
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Zhi-Feng Gu
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Xian-Run Pan
- Innovative Institute of Chinese Medicine and Pharmacy, Academy for Interdiscipline, Chengdu University of Traditional Chinese Medicine, Chengdu, China
| | - Jian Li
- School of Basic Medical Sciences, Chengdu University, Chengdu, China
| | - Hui Ding
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Yang Zhang
- Innovative Institute of Chinese Medicine and Pharmacy, Academy for Interdiscipline, Chengdu University of Traditional Chinese Medicine, Chengdu, China
| | - Ke-Jun Deng
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
10
|
Wan H, Liu Q, Ju Y. Utilize a few features to classify presynaptic and postsynaptic neurotoxins. Comput Biol Med 2023; 152:106380. [PMID: 36473343 DOI: 10.1016/j.compbiomed.2022.106380] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2022] [Revised: 10/21/2022] [Accepted: 11/28/2022] [Indexed: 12/02/2022]
Abstract
Neurotoxins are a class of proteins that have a significant damaging effect on nerve tissue. Neurotoxins are classified into presynaptic neurotoxins and postsynaptic neurotoxins, and accurate identification of neurotoxins plays a key role in drug development. In this study, 90 presynaptic neurotoxins and 165 postsynaptic neurotoxins were classified. The features of the presynaptic and postsynaptic neurotoxin sequences were extracted using the AutoProp feature extraction method and feature selection was performed using the maximum relevance maximum distance (MRMD) program, Finally, only two features were retained to achieve 84.7% classification accuracy. Moreover, it was found that the two retained features were present in the conserved sites and motifs of presynaptic neurotoxins and could represent the critical structures of presynaptic neurotoxins. This method demonstrates that using a few key features to classify proteins can effectively identify critical protein structures.
Collapse
Affiliation(s)
- Hao Wan
- Institute of Advanced Cross-field Science, College of Life Science, Qingdao University, Qingdao, China
| | - Qing Liu
- Department of Anesthesiology, Hospital (T.C.M) Affiliated to Southwest Medical University, Luzhou, China.
| | - Ying Ju
- School of Informatics, Xiamen University, Xiamen, China.
| |
Collapse
|
11
|
Hassan A, Alkhalifah T, Alturise F, Khan YD. RCCC_Pred: A Novel Method for Sequence-Based Identification of Renal Clear Cell Carcinoma Genes through DNA Mutations and a Blend of Features. Diagnostics (Basel) 2022; 12:diagnostics12123036. [PMID: 36553042 PMCID: PMC9776995 DOI: 10.3390/diagnostics12123036] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Revised: 11/24/2022] [Accepted: 11/30/2022] [Indexed: 12/07/2022] Open
Abstract
To save lives from cancer, it is very crucial to diagnose it at its early stages. One solution to early diagnosis lies in the identification of the cancer driver genes and their mutations. Such diagnostics can substantially minimize the mortality rate of this deadly disease. However, concurrently, the identification of cancer driver gene mutation through experimental mechanisms could be an expensive, slow, and laborious job. The advancement of computational strategies that could help in the early prediction of cancer growth effectively and accurately is thus highly needed towards early diagnoses and a decrease in the mortality rates due to this disease. Herein, we aim to predict clear cell renal carcinoma (RCCC) at the level of the genes, using the genomic sequences. The dataset was taken from IntOgen Cancer Mutations Browser and all genes' standard DNA sequences were taken from the NCBI database. Using cancer-associated information of mutation from INTOGEN, the benchmark dataset was generated by creating the mutations in original sequences. After extensive feature extraction, the dataset was used to train ANN+ Hist Gradient boosting that could perform the classification of RCCC genes, other cancer-associated genes, and non-cancerous/unknown (non-tumor driver) genes. Through an independent dataset test, the accuracy observed was 83%, whereas the 10-fold cross-validation and Jackknife validation yielded 98% and 100% accurate results, respectively. The proposed predictor RCCC_Pred is able to identify RCCC genes with high accuracy and efficiency and can help scientists/researchers easily predict and diagnose cancer at its early stages.
Collapse
Affiliation(s)
- Arfa Hassan
- Department of Computer Science, School of Systems and Technology, University of Management and Technology, Lahore 54770, Pakistan
| | - Tamim Alkhalifah
- Department of Computer, College of Science and Arts in Ar Rass, Qassim University, Ar Rass 58892, Qassim, Saudi Arabia
- Correspondence:
| | - Fahad Alturise
- Department of Computer, College of Science and Arts in Ar Rass, Qassim University, Ar Rass 58892, Qassim, Saudi Arabia
| | - Yaser Daanial Khan
- Department of Computer Science, School of Systems and Technology, University of Management and Technology, Lahore 54770, Pakistan
| |
Collapse
|
12
|
Tang H, Sun L, Huang J, Yang Z, Li C, Zhou X. The mechanism and biomarker function of Cavin-2 in lung ischemia-reperfusion injury. Comput Biol Med 2022; 151:106234. [PMID: 36335812 DOI: 10.1016/j.compbiomed.2022.106234] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Revised: 10/01/2022] [Accepted: 10/22/2022] [Indexed: 12/27/2022]
Abstract
BACKGROUND Lung Ischemia Reperfusion injury(LIRI) is one of the most predominant complications of ischemic lung disease. Cavin-2 emerged as a regulator of a variety of cellular processes, including endocytosis, lipid homeostasis, signal transduction and tumorigenesis, but the function of Cavin-2 in LIRI is unknown. The purpose of this study was to determine the predictive potential of Cavin-2 in protecting lung ischemia-reperfusion injury and its corresponding mechanisms. METHODS We found the strong relationship between Cavin-2 and multiple immune-related genes by deep learning method. To reveal the mechanism of Cavin-2 in LIRI, the LIRI SD rat model was constructed to detect the expression of Cavin-2 in the lung tissue of SD rats after LIRI, and the expression of Cavin-2 in lung cell lines was also detected. The expression of IL-6, IL-10 and MDA in cells after Cavin-2 over-expression or knockdown was examined under hypoxic conditions. The expression levels of p-AKT, p-STAT3 and p-ERK1/2 were measured in over-expressing Cavin-2 cells under hypoxic-ischemia conditions, and then the corresponding blockers of AKT, STAT3 and ERK1/2 were given to verify, whether they play a protective role in LIRI. RESULTS After hypoxia, the expression of Cavin-2 in rat lung tissues was significantly increased, and the cellular activity and IL-10 in Cavin-2 over-expressing cells were significantly higher than that of the control group, while IL-6 and MDA were significantly lower than that of the control group, while the above results were reversed in Cavin-2 knockdown cells; Meanwhile, the phosphorylation levels of AKT, STAT3, and ERK1/2 were significantly increased in Cavin-2 over-expression cells after hypoxia. When AKT, STAT3, and ERK1/2 specific blockers were given, they lost their protective effect against LIRI. CONCLUSIONS Cavin-2 shows biomarker potential in protecting lung from ischemia-reperfusion injury through the survivor activating factor enhancement (SAFE) and reperfusion injury salvage kinase (RISK) pathway.
Collapse
Affiliation(s)
- Hexiao Tang
- Department of Thoracic Surgery, Zhongnan Hospital of Wuhan University, Wuhan, Hubei, China
| | - Linao Sun
- Tianjin Medical University, Tianjin, China
| | - Jingyu Huang
- Department of Thoracic Surgery, Zhongnan Hospital of Wuhan University, Wuhan, Hubei, China
| | - Zetian Yang
- Department of Thoracic Surgery, Zhongnan Hospital of Wuhan University, Wuhan, Hubei, China
| | - Changsheng Li
- Department of Thoracic Surgery, Zhongnan Hospital of Wuhan University, Wuhan, Hubei, China.
| | - Xuefeng Zhou
- Department of Thoracic Surgery, Zhongnan Hospital of Wuhan University, Wuhan, Hubei, China.
| |
Collapse
|
13
|
DNAPred_Prot: Identification of DNA-Binding Proteins Using Composition- and Position-Based Features. Appl Bionics Biomech 2022; 2022:5483115. [PMID: 35465187 PMCID: PMC9020926 DOI: 10.1155/2022/5483115] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2021] [Revised: 12/25/2021] [Accepted: 02/05/2022] [Indexed: 12/29/2022] Open
Abstract
In the domain of genome annotation, the identification of DNA-binding protein is one of the crucial challenges. DNA is considered a blueprint for the cell. It contained all necessary information for building and maintaining the trait of an organism. It is DNA, which makes a living thing, a living thing. Protein interaction with DNA performs an essential role in regulating DNA functions such as DNA repair, transcription, and regulation. Identification of these proteins is a crucial task for understanding the regulation of genes. Several methods have been developed to identify the binding sites of DNA and protein depending upon the structures and sequences, but they were costly and time-consuming. Therefore, we propose a methodology named “DNAPred_Prot”, which uses various position and frequency-dependent features from protein sequences for efficient and effective prediction of DNA-binding proteins. Using testing techniques like 10-fold cross-validation and jackknife testing an accuracy of 94.95% and 95.11% was yielded, respectively. The results of SVM and ANN were also compared with those of a random forest classifier. The robustness of the proposed model was evaluated by using the independent dataset PDB186, and an accuracy of 91.47% was achieved by it. From these results, it can be predicted that the suggested methodology performs better than other extant methods for the identification of DNA-binding proteins.
Collapse
|
14
|
Identification of Nine mRNA Signatures for Sepsis Using Random Forest. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2022; 2022:5650024. [PMID: 35345523 PMCID: PMC8957445 DOI: 10.1155/2022/5650024] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/09/2022] [Accepted: 02/28/2022] [Indexed: 11/17/2022]
Abstract
Sepsis has high fatality rates. Early diagnosis could increase its curating rates. There were no reliable molecular biomarkers to distinguish between infected and uninfected patients currently, which limit the treatment of sepsis. To this end, we analyzed gene expression datasets from the GEO database to identify its mRNA signature. First, two gene expression datasets (GSE154918 and GSE131761) were downloaded to identify the differentially expressed genes (DEGs) using Limma package. Totally 384 common DEGs were found in three contrast groups. We found that as the condition worsens, more genes were under disorder condition. Then, random forest model was performed with expression matrix of all genes as feature and disease state as label. After which 279 genes were left. We further analyzed the functions of 279 important DEGs, and their potential biological roles mainly focused on neutrophil threshing, neutrophil activation involved in immune response, neutrophil-mediated immunity, RAGE receptor binding, long-chain fatty acid binding, specific granule, tertiary granule, and secretory granule lumen. Finally, the top nine mRNAs (MCEMP1, PSTPIP2, CD177, GCA, NDUFAF1, CLIC1, UFD1, SEPT9, and UBE2A) associated with sepsis were considered as signatures for distinguishing between sepsis and healthy controls. Based on 5-fold cross-validation and leave-one-out cross-validation, the nine mRNA signature showed very high AUC.
Collapse
|
15
|
Wang X, Li Q, Liu Y, Du Z, Jin R. Drug repositioning of COVID-19 based on mixed graph network and ion channel. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2022; 19:3269-3284. [PMID: 35341251 DOI: 10.3934/mbe.2022151] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Research on the relationship between drugs and targets is the key to precision medicine. Ion channel is a kind of important drug targets. Aiming at the urgent needs of corona virus disease 2019 (COVID-19) treatment and drug development, this paper designed a mixed graph network model to predict the affinity between ion channel targets of COVID-19 and drugs. According to the simplified molecular input line entry specification (SMILES) code of drugs, firstly, the atomic features were extracted to construct the point sets, and edge sets were constructed according to atomic bonds. Then the undirected graph with atomic features was generated by RDKit tool and the graph attention layer was used to extract the drug feature information. Five ion channel target proteins were screened from the whole SARS-CoV-2 genome sequences of NCBI database, and the protein features were extracted by convolution neural network (CNN). Using attention mechanism and graph convolutional network (GCN), the extracted drug features and target features information were connected. After two full connection layers operation, the drug-target affinity was output, and model was obtained. Kiba dataset was used to train the model and determine the model parameters. Compared with DeepDTA, WideDTA, graph attention network (GAT), GCN and graph isomorphism network (GIN) models, it was proved that the mean square error (MSE) of the proposed model was decreased by 0.055, 0.04, 0.001, 0.046, 0.013 and the consistency index (CI) was increased by 0.028, 0.016, 0.003, 0.03 and 0.01, respectively. It can predict the drug-target affinity more accurately. According to the prediction results of drug-target affinity of SARS-CoV-2 ion channel targets, seven kinds of small molecule drugs acting on five ion channel targets were obtained, namely SCH-47112, Dehydroaltenusin, alternariol 5-o-sulfate, LPA1 antagonist 1, alternariol, butin, and AT-9283.These drugs provide a reference for drug repositioning and precise treatment of COVID-19.
Collapse
Affiliation(s)
- Xianfang Wang
- Henan Institute of Technology, Xinxiang 453003, China
- College of Computer and Information Engineering, Henan Normal University, Xinxiang 453007, China
| | - Qimeng Li
- College of Computer and Information Engineering, Henan Normal University, Xinxiang 453007, China
| | - Yifeng Liu
- Henan Institute of Technology, Xinxiang 453003, China
| | - Zhiyong Du
- Henan Institute of Technology, Xinxiang 453003, China
| | - Ruixia Jin
- SanQuan Medical College, Xinxiang 453003, China
| |
Collapse
|