1
|
Asim MN, Ibrahim MA, Asif T, Dengel A. RNA sequence analysis landscape: A comprehensive review of task types, databases, datasets, word embedding methods, and language models. Heliyon 2025; 11:e41488. [PMID: 39897847 PMCID: PMC11783440 DOI: 10.1016/j.heliyon.2024.e41488] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2024] [Revised: 12/23/2024] [Accepted: 12/24/2024] [Indexed: 02/04/2025] Open
Abstract
Deciphering information of RNA sequences reveals their diverse roles in living organisms, including gene regulation and protein synthesis. Aberrations in RNA sequence such as dysregulation and mutations can drive a diverse spectrum of diseases including cancers, genetic disorders, and neurodegenerative conditions. Furthermore, researchers are harnessing RNA's therapeutic potential for transforming traditional treatment paradigms into personalized therapies through the development of RNA-based drugs and gene therapies. To gain insights of biological functions and to detect diseases at early stages and develop potent therapeutics, researchers are performing diverse types RNA sequence analysis tasks. RNA sequence analysis through conventional wet-lab methods is expensive, time-consuming and error prone. To enable large-scale RNA sequence analysis, empowerment of wet-lab experimental methods with Artificial Intelligence (AI) applications necessitates scientists to have a comprehensive knowledge of both DNA and AI fields. While molecular biologists encounter challenges in understanding AI methods, computer scientists often lack basic foundations of RNA sequence analysis tasks. Considering the absence of a comprehensive literature that bridges this research gap and promotes the development of AI-driven RNA sequence analysis applications, the contributions of this manuscript are manifold: It equips AI researchers with biological foundations of 47 distinct RNA sequence analysis tasks. It sets a stage for development of benchmark datasets related to 47 distinct RNA sequence analysis tasks by facilitating cruxes of 64 different biological databases. It presents word embeddings and language models applications across 47 distinct RNA sequence analysis tasks. It streamlines the development of new predictors by providing a comprehensive survey of 58 word embeddings and 70 language models based predictive pipelines performance values as well as top performing traditional sequence encoding based predictors and their performances across 47 RNA sequence analysis tasks.
Collapse
Affiliation(s)
- Muhammad Nabeel Asim
- German Research Center for Artificial Intelligence GmbH, Kaiserslautern, 67663, Germany
| | - Muhammad Ali Ibrahim
- German Research Center for Artificial Intelligence GmbH, Kaiserslautern, 67663, Germany
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, 67663, Germany
| | - Tayyaba Asif
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, 67663, Germany
| | - Andreas Dengel
- German Research Center for Artificial Intelligence GmbH, Kaiserslautern, 67663, Germany
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, 67663, Germany
| |
Collapse
|
2
|
Lilhore UK, Simiaya S, Alhussein M, Faujdar N, Dalal S, Aurangzeb K. Optimizing protein sequence classification: integrating deep learning models with Bayesian optimization for enhanced biological analysis. BMC Med Inform Decis Mak 2024; 24:236. [PMID: 39192227 DOI: 10.1186/s12911-024-02631-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2024] [Accepted: 08/07/2024] [Indexed: 08/29/2024] Open
Abstract
Efforts to enhance the accuracy of protein sequence classification are of utmost importance in driving forward biological analyses and facilitating significant medical advancements. This study presents a cutting-edge model called ProtICNN-BiLSTM, which combines attention-based Improved Convolutional Neural Networks (ICNN) and Bidirectional Long Short-Term Memory (BiLSTM) units seamlessly. Our main goal is to improve the accuracy of protein sequence classification by carefully optimizing performance through Bayesian Optimisation. ProtICNN-BiLSTM combines the power of CNN and BiLSTM architectures to effectively capture local and global protein sequence dependencies. In the proposed model, the ICNN component uses convolutional operations to identify local patterns. Captures long-range associations by analyzing sequence data forward and backwards. In advanced biological studies, Bayesian Optimisation optimizes model hyperparameters for efficiency and robustness. The model was extensively confirmed with PDB-14,189 and other protein data. We found that ProtICNN-BiLSTM outperforms traditional categorization models. Bayesian Optimization's fine-tuning and seamless integration of local and global sequence information make it effective. The precision of ProtICNN-BiLSTM improves comparative protein sequence categorization. The study improves computational bioinformatics for complex biological analysis. Good results from the ProtICNN-BiLSTM model improve protein sequence categorization. This powerful tool could improve medical and biological research. The breakthrough protein sequence classification model is ProtICNN-BiLSTM. Bayesian optimization, ICNN, and BiLSTM analyze biological data accurately.
Collapse
Affiliation(s)
- Umesh Kumar Lilhore
- School of Computing Science and Engineering, Galgotias University, Greater Noida, UP, India
| | - Sarita Simiaya
- School of Computing Science and Engineering, Galgotias University, Greater Noida, UP, India
| | - Musaed Alhussein
- Department of Computer Engineering, College of Computer and Information Sciences, King Saud University, P. O. Box 51178, Riyadh, 11543, Saudi Arabia
| | - Neetu Faujdar
- Department of Computer Engineering and Applications, GLA University, 281406, UP, Mathura, India
| | | | - Khursheed Aurangzeb
- Department of Computer Engineering, College of Computer and Information Sciences, King Saud University, P. O. Box 51178, Riyadh, 11543, Saudi Arabia
| |
Collapse
|
3
|
Vaculík O, Chalupová E, Grešová K, Majtner T, Alexiou P. Transfer Learning Allows Accurate RBP Target Site Prediction with Limited Sample Sizes. BIOLOGY 2023; 12:1276. [PMID: 37886986 PMCID: PMC10604046 DOI: 10.3390/biology12101276] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Revised: 09/19/2023] [Accepted: 09/21/2023] [Indexed: 10/28/2023]
Abstract
RNA-binding proteins are vital regulators in numerous biological processes. Their disfunction can result in diverse diseases, such as cancer or neurodegenerative disorders, making the prediction of their binding sites of high importance. Deep learning (DL) has brought about a revolution in various biological domains, including the field of protein-RNA interactions. Nonetheless, several challenges persist, such as the limited availability of experimentally validated binding sites to train well-performing DL models for the majority of proteins. Here, we present a novel training approach based on transfer learning (TL) to address the issue of limited data. Employing a sophisticated and interpretable architecture, we compare the performance of our method trained using two distinct approaches: training from scratch (SCR) and utilizing TL. Additionally, we benchmark our results against the current state-of-the-art methods. Furthermore, we tackle the challenges associated with selecting appropriate input features and determining optimal interval sizes. Our results show that TL enhances model performance, particularly in datasets with minimal training data, where satisfactory results can be achieved with just a few hundred RNA binding sites. Moreover, we demonstrate that integrating both sequence and evolutionary conservation information leads to superior performance. Additionally, we showcase how incorporating an attention layer into the model facilitates the interpretation of predictions within a biologically relevant context.
Collapse
Affiliation(s)
- Ondřej Vaculík
- Central European Institute of Technology (CEITEC), Masaryk University, 625 00 Brno, Czech Republic
- Faculty of Science, National Centre for Biomolecular Research, Masaryk University, 625 00 Brno, Czech Republic
| | - Eliška Chalupová
- Faculty of Science, National Centre for Biomolecular Research, Masaryk University, 625 00 Brno, Czech Republic
| | - Katarína Grešová
- Central European Institute of Technology (CEITEC), Masaryk University, 625 00 Brno, Czech Republic
- Faculty of Science, National Centre for Biomolecular Research, Masaryk University, 625 00 Brno, Czech Republic
| | - Tomáš Majtner
- Central European Institute of Technology (CEITEC), Masaryk University, 625 00 Brno, Czech Republic
- Department of Molecular Sociology, Max Planck Institute of Biophysics, 60439 Frankfurt am Main, Germany
| | - Panagiotis Alexiou
- Central European Institute of Technology (CEITEC), Masaryk University, 625 00 Brno, Czech Republic
- Department of Applied Biomedical Science, Faculty of Health Sciences, University of Malta, MSD 2080 Msida, Malta
- Centre for Molecular Medicine & Biobanking, University of Malta, MSD 2080 Msida, Malta
| |
Collapse
|
4
|
Laverty KU, Jolma A, Pour SE, Zheng H, Ray D, Morris Q, Hughes TR. PRIESSTESS: interpretable, high-performing models of the sequence and structure preferences of RNA-binding proteins. Nucleic Acids Res 2022; 50:e111. [PMID: 36018788 DOI: 10.1093/nar/gkac694] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2021] [Revised: 07/22/2022] [Accepted: 08/03/2022] [Indexed: 12/23/2022] Open
Abstract
Modelling both primary sequence and secondary structure preferences for RNA binding proteins (RBPs) remains an ongoing challenge. Current models use varied RNA structure representations and can be difficult to interpret and evaluate. To address these issues, we present a universal RNA motif-finding/scanning strategy, termed PRIESSTESS (Predictive RBP-RNA InterpretablE Sequence-Structure moTif regrESSion), that can be applied to diverse RNA binding datasets. PRIESSTESS identifies dozens of enriched RNA sequence and/or structure motifs that are subsequently reduced to a set of core motifs by logistic regression with LASSO regularization. Importantly, these core motifs are easily visualized and interpreted, and provide a measure of RBP secondary structure specificity. We used PRIESSTESS to interrogate new HTR-SELEX data for 23 RBPs with diverse RNA binding modes and captured known primary sequence and secondary structure preferences for each. Moreover, when applying PRIESSTESS to 144 RBPs across 202 RNA binding datasets, 75% showed an RNA secondary structure preference but only 10% had a preference besides unpaired bases, suggesting that most RBPs simply recognize the accessibility of primary sequences.
Collapse
Affiliation(s)
- Kaitlin U Laverty
- Department of Molecular Genetics, University of Toronto, Toronto, Canada
| | - Arttu Jolma
- Department of Molecular Genetics, University of Toronto, Toronto, Canada.,Donnelly Centre, University of Toronto, Toronto, Canada
| | - Sara E Pour
- Department of Molecular Genetics, University of Toronto, Toronto, Canada
| | - Hong Zheng
- Donnelly Centre, University of Toronto, Toronto, Canada
| | - Debashish Ray
- Donnelly Centre, University of Toronto, Toronto, Canada
| | - Quaid Morris
- Department of Molecular Genetics, University of Toronto, Toronto, Canada.,Computational and Systems Biology, Memorial Sloan Kettering Cancer Center, New York, USA
| | - Timothy R Hughes
- Department of Molecular Genetics, University of Toronto, Toronto, Canada.,Donnelly Centre, University of Toronto, Toronto, Canada
| |
Collapse
|
5
|
Cardoso TF, Bruscadin JJ, Afonso J, Petrini J, Andrade BGN, de Oliveira PSN, Malheiros JM, Rocha MIP, Zerlotini A, Ferraz JBS, Mourão GB, Coutinho LL, Regitano LCA. EEF1A1 transcription cofactor gene polymorphism is associated with muscle gene expression and residual feed intake in Nelore cattle. Mamm Genome 2022; 33:619-628. [PMID: 35816191 DOI: 10.1007/s00335-022-09959-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2021] [Accepted: 06/22/2022] [Indexed: 12/01/2022]
Abstract
Cis-acting effects of noncoding variants on gene expression and regulatory molecules constitute a significant factor for phenotypic variation in complex traits. To provide new insights into the impacts of single-nucleotide polymorphisms (SNPs) on transcription factors (TFs) and transcription cofactors (TcoF) coding genes, we carried out a multi-omic analysis to identify cis-regulatory effects of SNPs on these genes' expression in muscle and describe their association with feed efficiency-related traits in Nelore cattle. As a result, we identified one SNP, the rs137256008C > T, predicted to impact the EEF1A1 gene expression (β = 3.02; P-value = 3.51E-03) and the residual feed intake trait (β = - 3.47; P-value = 0.02). This SNP was predicted to modify transcription factor sites and overlaps with several QTL for feed efficiency traits. In addition, co-expression network analyses showed that animals containing the T allele of the rs137256008 SNP may be triggering changes in the gene network. Therefore, our analyses reinforce and contribute to a better understanding of the biological mechanisms underlying gene expression control of feed efficiency traits in bovines. The cis-regulatory SNP can be used as biomarker for feed efficiency in Nelore cattle.
Collapse
Affiliation(s)
- T F Cardoso
- Embrapa Southeast Livestock, São Carlos, SP, Brazil
| | - J J Bruscadin
- Program on Evolutionary Genetics and Molecular Biology, Federal University of São Carlos, São Carlos, SP, Brazil
| | - J Afonso
- Embrapa Southeast Livestock, São Carlos, SP, Brazil
| | - J Petrini
- Department of Animal Science, "Luiz de Queiroz" College of Agriculture, University of São Paulo/ESALQ, Piracicaba, SP, Brazil
| | - B G N Andrade
- Computer Science Department, Munster Technological University, MTU/ADAPT, Cork, Ireland
| | - P S N de Oliveira
- Program on Evolutionary Genetics and Molecular Biology, Federal University of São Carlos, São Carlos, SP, Brazil
| | - J M Malheiros
- Federal University of Latin American Integration, Foz do Iguaçu, Paraná, Brazil
| | - M I P Rocha
- Program on Evolutionary Genetics and Molecular Biology, Federal University of São Carlos, São Carlos, SP, Brazil
| | - A Zerlotini
- Embrapa Agricultural Informatics, Campinas, SP, Brazil
| | - J B S Ferraz
- Department of Veterinary Medicine, University of São Paulo/FZEA, Pirassununga, Brazil
| | - G B Mourão
- Department of Animal Science, "Luiz de Queiroz" College of Agriculture, University of São Paulo/ESALQ, Piracicaba, SP, Brazil
| | - L L Coutinho
- Department of Animal Science, "Luiz de Queiroz" College of Agriculture, University of São Paulo/ESALQ, Piracicaba, SP, Brazil
| | | |
Collapse
|
6
|
Du X, Zhao X, Zhang Y. DeepBtoD: Improved RNA-binding proteins prediction via integrated deep learning. J Bioinform Comput Biol 2022; 20:2250006. [PMID: 35451938 DOI: 10.1142/s0219720022500068] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
RNA-binding proteins (RBPs) have crucial roles in various cellular processes such as alternative splicing and gene regulation. Therefore, the analysis and identification of RBPs is an essential issue. However, although many computational methods have been developed for predicting RBPs, a few studies simultaneously consider local and global information from the perspective of the RNA sequence. Facing this challenge, we present a novel method called DeepBtoD, which predicts RBPs directly from RNA sequences. First, a [Formula: see text]-BtoD encoding is designed, which takes into account the composition of [Formula: see text]-nucleotides and their relative positions and forms a local module. Second, we designed a multi-scale convolutional module embedded with a self-attentive mechanism, the ms-focusCNN, which is used to further learn more effective, diverse, and discriminative high-level features. Finally, global information is considered to supplement local modules with ensemble learning to predict whether the target RNA binds to RBPs. Our preliminary 24 independent test datasets show that our proposed method can classify RBPs with the area under the curve of 0.933. Remarkably, DeepBtoD shows competitive results across seven state-of-the-art methods, suggesting that RBPs can be highly recognized by integrating local [Formula: see text]-BtoD and global information only from RNA sequences. Hence, our integrative method may be useful to improve the power of RBPs prediction, which might be particularly useful for modeling protein-nucleic acid interactions in systems biology studies. Our DeepBtoD server can be accessed at http://175.27.228.227/DeepBtoD/.
Collapse
Affiliation(s)
- XiuQuan Du
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Anhui University, Hefei 230601, Anhui, P. R. China.,School of Computer Science and Technology, Anhui University, Hefei 230601, Anhui, P. R. China
| | - XiuJuan Zhao
- School of Computer Science and Technology, Anhui University, Hefei 230601, Anhui, P. R. China
| | - YanPing Zhang
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Anhui University, Hefei 230601, Anhui, P. R. China
| |
Collapse
|
7
|
Yamada K, Hamada M. Prediction of RNA-protein interactions using a nucleotide language model. BIOINFORMATICS ADVANCES 2022; 2:vbac023. [PMID: 36699410 PMCID: PMC9710633 DOI: 10.1093/bioadv/vbac023] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/26/2021] [Revised: 02/28/2022] [Accepted: 04/05/2022] [Indexed: 01/28/2023]
Abstract
Motivation The accumulation of sequencing data has enabled researchers to predict the interactions between RNA sequences and RNA-binding proteins (RBPs) using novel machine learning techniques. However, existing models are often difficult to interpret and require additional information to sequences. Bidirectional encoder representations from transformer (BERT) is a language-based deep learning model that is highly interpretable. Therefore, a model based on BERT architecture can potentially overcome such limitations. Results Here, we propose BERT-RBP as a model to predict RNA-RBP interactions by adapting the BERT architecture pretrained on a human reference genome. Our model outperformed state-of-the-art prediction models using the eCLIP-seq data of 154 RBPs. The detailed analysis further revealed that BERT-RBP could recognize both the transcript region type and RNA secondary structure only based on sequence information. Overall, the results provide insights into the fine-tuning mechanism of BERT in biological contexts and provide evidence of the applicability of the model to other RNA-related problems. Availability and implementation Python source codes are freely available at https://github.com/kkyamada/bert-rbp. The datasets underlying this article were derived from sources in the public domain: [RBPsuite (http://www.csbio.sjtu.edu.cn/bioinf/RBPsuite/), Ensembl Biomart (http://asia.ensembl.org/biomart/martview/)]. Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
- Keisuke Yamada
- Department of Electrical Engineering and Bioscience, School of Advanced Science and Engineering, Waseda University, Tokyo 169-8555, Japan
| | - Michiaki Hamada
- Department of Electrical Engineering and Bioscience, School of Advanced Science and Engineering, Waseda University, Tokyo 169-8555, Japan
- Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), Okubo, Shinjuku, Tokyo 169-8555, Japan
| |
Collapse
|
8
|
Chalupová E, Vaculík O, Poláček J, Jozefov F, Majtner T, Alexiou P. ENNGene: an Easy Neural Network model building tool for Genomics. BMC Genomics 2022; 23:248. [PMID: 35361122 PMCID: PMC8973509 DOI: 10.1186/s12864-022-08414-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2021] [Accepted: 02/23/2022] [Indexed: 11/17/2022] Open
Abstract
Background The recent big data revolution in Genomics, coupled with the emergence of Deep Learning as a set of powerful machine learning methods, has shifted the standard practices of machine learning for Genomics. Even though Deep Learning methods such as Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) are becoming widespread in Genomics, developing and training such models is outside the ability of most researchers in the field. Results Here we present ENNGene—Easy Neural Network model building tool for Genomics. This tool simplifies training of custom CNN or hybrid CNN-RNN models on genomic data via an easy-to-use Graphical User Interface. ENNGene allows multiple input branches, including sequence, evolutionary conservation, and secondary structure, and performs all the necessary preprocessing steps, allowing simple input such as genomic coordinates. The network architecture is selected and fully customized by the user, from the number and types of the layers to each layer's precise set-up. ENNGene then deals with all steps of training and evaluation of the model, exporting valuable metrics such as multi-class ROC and precision-recall curve plots or TensorBoard log files. To facilitate interpretation of the predicted results, we deploy Integrated Gradients, providing the user with a graphical representation of an attribution level of each input position. To showcase the usage of ENNGene, we train multiple models on the RBP24 dataset, quickly reaching the state of the art while improving the performance on more than half of the proteins by including the evolutionary conservation score and tuning the network per protein. Conclusions As the role of DL in big data analysis in the near future is indisputable, it is important to make it available for a broader range of researchers. We believe that an easy-to-use tool such as ENNGene can allow Genomics researchers without a background in Computational Sciences to harness the power of DL to gain better insights into and extract important information from the large amounts of data available in the field. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-022-08414-x.
Collapse
Affiliation(s)
- Eliška Chalupová
- Faculty of Science, National Centre for Biomolecular Research, Masaryk University, Brno, Czechia.,Central European Institute of Technology (CEITEC), Masaryk University, Brno, Czechia
| | - Ondřej Vaculík
- Faculty of Science, National Centre for Biomolecular Research, Masaryk University, Brno, Czechia.,Central European Institute of Technology (CEITEC), Masaryk University, Brno, Czechia
| | - Jakub Poláček
- Faculty of Informatics, Masaryk University, Brno, Czechia
| | - Filip Jozefov
- Faculty of Informatics, Masaryk University, Brno, Czechia
| | - Tomáš Majtner
- Central European Institute of Technology (CEITEC), Masaryk University, Brno, Czechia
| | - Panagiotis Alexiou
- Central European Institute of Technology (CEITEC), Masaryk University, Brno, Czechia.
| |
Collapse
|
9
|
Guo X, Zhou W, Yu Y, Cai Y, Zhang Y, Du A, Lu Q, Ding Y, Li C. Multiple Laplacian Regularized RBF Neural Network for Assessing Dry Weight of Patients With End-Stage Renal Disease. Front Physiol 2021; 12:790086. [PMID: 34966294 PMCID: PMC8711098 DOI: 10.3389/fphys.2021.790086] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2021] [Accepted: 11/17/2021] [Indexed: 11/28/2022] Open
Abstract
Dry weight (DW) is an important dialysis index for patients with end-stage renal disease. It can guide clinical hemodialysis. Brain natriuretic peptide, chest computed tomography image, ultrasound, and bioelectrical impedance analysis are key indicators (multisource information) for assessing DW. By these approaches, a trial-and-error method (traditional measurement method) is employed to assess DW. The assessment of clinician is time-consuming. In this study, we developed a method based on artificial intelligence technology to estimate patient DW. Based on the conventional radial basis function neural (RBFN) network, we propose a multiple Laplacian-regularized RBFN (MLapRBFN) model to predict DW of patient. Compared with other model and body composition monitor, our method achieves the lowest value (1.3226) of root mean square error. In Bland-Altman analysis of MLapRBFN, the number of out agreement interval is least (17 samples). MLapRBFN integrates multiple Laplace regularization terms, and employs an efficient iterative algorithm to solve the model. The ratio of out agreement interval is 3.57%, which is lower than 5%. Therefore, our method can be tentatively applied for clinical evaluation of DW in hemodialysis patients.
Collapse
Affiliation(s)
- Xiaoyi Guo
- Hemodialysis Center, The Affiliated Wuxi People's Hospital of Nanjing Medical University, Wuxi, China.,Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Wei Zhou
- Hemodialysis Center, The Affiliated Wuxi People's Hospital of Nanjing Medical University, Wuxi, China
| | - Yan Yu
- Hemodialysis Center, The Affiliated Wuxi People's Hospital of Nanjing Medical University, Wuxi, China
| | - Yinghua Cai
- Department of Nursing, The Affiliated Wuxi People's Hospital of Nanjing Medical University, Wuxi, China
| | - Yuan Zhang
- Hemodialysis Center, The Affiliated Wuxi People's Hospital of Nanjing Medical University, Wuxi, China
| | - Aiyan Du
- Hemodialysis Center, The Affiliated Wuxi People's Hospital of Nanjing Medical University, Wuxi, China
| | - Qun Lu
- Department of Nursing, The Affiliated Wuxi People's Hospital of Nanjing Medical University, Wuxi, China
| | - Yijie Ding
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Chao Li
- General Hospital of Heilongjiang Province Land Reclamation Bureau, Harbin, China
| |
Collapse
|
10
|
Wang H, Xi Q, Liang P, Zheng L, Hong Y, Zuo Y. IHEC_RAAC: a online platform for identifying human enzyme classes via reduced amino acid cluster strategy. Amino Acids 2021; 53:239-251. [PMID: 33486591 DOI: 10.1007/s00726-021-02941-9] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2020] [Accepted: 01/11/2021] [Indexed: 12/18/2022]
Abstract
Enzymes have been proven to play considerable roles in disease diagnosis and biological functions. The feature extraction that truly reflects the intrinsic properties of protein is the most critical step for the automatic identification of enzymes. Although lots of feature extraction methods have been proposed, some challenges remain. In this study, we developed a predictor called IHEC_RAAC, which has the capability to identify whether a protein is a human enzyme and distinguish the function of the human enzyme. To improve the feature representation ability, protein sequences were encoded by a new feature-vector called 'reduced amino acid cluster'. We calculated 673 amino acid reduction alphabets to determine the optimal feature representative scheme. The tenfold cross-validation test showed that the accuracy of IHEC_RAAC to identify human enzymes was 74.66% and further discriminate the human enzyme classes with an accuracy of 54.78%, which was 2.06% and 8.68% higher than the state-of-the-art predictors, respectively. Additionally, the results from the independent dataset indicated that IHEC_RAAC can effectively predict human enzymes and human enzyme classes to further provide guidance for protein research. A user-friendly web server, IHEC_RAAC, is freely accessible at http://bioinfor.imu.edu.cn/ihecraac .
Collapse
Affiliation(s)
- Hao Wang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070, China
| | - Qilemuge Xi
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070, China
| | - Pengfei Liang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070, China
| | - Lei Zheng
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070, China
| | - Yan Hong
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070, China
| | - Yongchun Zuo
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070, China.
| |
Collapse
|