1
|
Venkatesan A, Basak J, Bahadur RP. pmiRScan: a LightGBM based method for prediction of animal pre-miRNAs. Funct Integr Genomics 2025; 25:9. [PMID: 39786653 DOI: 10.1007/s10142-025-01527-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2024] [Revised: 12/03/2024] [Accepted: 01/01/2025] [Indexed: 01/12/2025]
Abstract
MicroRNAs (miRNA) are categorized as short endogenous non-coding RNAs, which have a significant role in post-transcriptional gene regulation. Identifying new animal precursor miRNA (pre-miRNA) and miRNA is crucial to understand the role of miRNAs in various biological processes including the development of diseases. The present study focuses on the development of a Light Gradient Boost (LGB) based method for the classification of animal pre-miRNAs using various sequence and secondary structural features. In various pre-miRNA families, distinct k-mer repeat signatures with a length of three nucleotides have been identified. Out of nine different classifiers that have been trained and tested in the present study, LGB has an overall better performance with an AUROC of 0.959. In comparison with the existing methods, our method 'pmiRScan' has an overall better performance with accuracy of 0.93, sensitivity of 0.86, specificity of 0.95 and F-score of 0.82. Moreover, pmiRScan effectively classifies pre-miRNAs from four distinct taxonomic groups: mammals, nematodes, molluscs and arthropods. We have used our classifier to predict genome-wide pre-miRNAs in human. We find a total of 313 pre-miRNA candidates using pmiRScan. A total of 180 potential mature miRNAs belonging to 60 distinct miRNA families are extracted from predicted pre-miRNAs; of which 128 were novel and are note reported in miRBase. These discoveries may enhance our current understanding of miRNAs and their targets in human. pmiRScan is freely available at http://www.csb.iitkgp.ac.in/applications/pmiRScan/index.php .
Collapse
Affiliation(s)
- Amrit Venkatesan
- Computational Structural Biology Lab, Department of Bioscience and Biotechnology, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal, 721302, India
| | - Jolly Basak
- Genomics of Plant Stress Biology Lab, Department of Biotechnology, Visva-Bharati, Santiniketan, West Bengal, 731235, India
| | - Ranjit Prasad Bahadur
- Computational Structural Biology Lab, Department of Bioscience and Biotechnology, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal, 721302, India.
- Bioinformatics Centre, Department of Bioscience and Biotechnology, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal, 721302, India.
| |
Collapse
|
2
|
Ke S, Huang Y, Wang D, Jiang Q, Luo Z, Li B, Yan D, Zhou J. BreCML: identifying breast cancer cell state in scRNA-seq via machine learning. Front Med (Lausanne) 2024; 11:1482726. [PMID: 39574916 PMCID: PMC11579858 DOI: 10.3389/fmed.2024.1482726] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2024] [Accepted: 10/15/2024] [Indexed: 11/24/2024] Open
Abstract
Breast cancer is a prevalent malignancy and one of the leading causes of cancer-related mortality among women worldwide. This disease typically manifests through the abnormal proliferation and dissemination of malignant cells within breast tissue. Current diagnostic and therapeutic strategies face significant challenges in accurately identifying and localizing specific subtypes of breast cancer. In this study, we developed a novel machine learning-based predictor, BreCML, designed to accurately classify subpopulations of breast cancer cells and their associated marker genes. BreCML exhibits outstanding predictive performance, achieving an accuracy of 98.92% on the training dataset. Utilizing the XGBoost algorithm, BreCML demonstrates superior accuracy (98.67%), precision (99.15%), recall (99.49%), and F1-score (99.79%) on the test dataset. Through the application of machine learning and feature selection techniques, BreCML successfully identified new key genes. This predictor not only serves as a powerful tool for assessing breast cancer cellular status but also offers a rapid and efficient means to uncover potential biomarkers, providing critical insights for precision medicine and therapeutic strategies.
Collapse
Affiliation(s)
- Shanbao Ke
- Department of Oncology, Henan Provincial People’s Hospital, Zhengzhou University People’s Hospital, Zhengzhou, China
| | - Yuxuan Huang
- Department of Neuroscience in the Behavioral Sciences, Duke University and Duke Kunshan University, Suzhou, China
| | - Dong Wang
- Pudong Institute for Health Development, Shanghai, China
| | - Qiang Jiang
- Department of Oncology, Henan Provincial People’s Hospital, Zhengzhou University People’s Hospital, Zhengzhou, China
| | - Zhangyang Luo
- Pudong Institute for Health Development, Shanghai, China
| | - Baiyu Li
- Department of Oncology, Henan Provincial People’s Hospital, Zhengzhou University People’s Hospital, Zhengzhou, China
| | - Danfang Yan
- Department of Radiation Oncology, The First Affiliated Hospital, College of Medicine, Zhejiang University, Hangzhou, China
| | - Jianwei Zhou
- Department of Oncology, Henan Provincial People’s Hospital, Zhengzhou University People’s Hospital, Zhengzhou, China
| |
Collapse
|
3
|
Yu S, Liu L, Wang H, Yan S, Zheng S, Ning J, Luo R, Fu X, Deng X. AtML: An Arabidopsis thaliana root cell identity recognition tool for medicinal ingredient accumulation. Methods 2024; 231:61-69. [PMID: 39293728 DOI: 10.1016/j.ymeth.2024.09.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2024] [Revised: 08/05/2024] [Accepted: 09/12/2024] [Indexed: 09/20/2024] Open
Abstract
Arabidopsis thaliana synthesizes various medicinal compounds, and serves as a model plant for medicinal plant research. Single-cell transcriptomics technologies are essential for understanding the developmental trajectory of plant roots, facilitating the analysis of synthesis and accumulation patterns of medicinal compounds in different cell subpopulations. Although methods for interpreting single-cell transcriptomics data are rapidly advancing in Arabidopsis, challenges remain in precisely annotating cell identity due to the lack of marker genes for certain cell types. In this work, we trained a machine learning system, AtML, using sequencing datasets from six cell subpopulations, comprising a total of 6000 cells, to predict Arabidopsis root cell stages and identify biomarkers through complete model interpretability. Performance testing using an external dataset revealed that AtML achieved 96.50% accuracy and 96.51% recall. Through the interpretability provided by AtML, our model identified 160 important marker genes, contributing to the understanding of cell type annotations. In conclusion, we trained AtML to efficiently identify Arabidopsis root cell stages, providing a new tool for elucidating the mechanisms of medicinal compound accumulation in Arabidopsis roots.
Collapse
Affiliation(s)
- Shicong Yu
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Rice Research Institute, Sichuan Agricultural University, Chengdu 611130, China
| | - Lijia Liu
- Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Hao Wang
- Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Shen Yan
- Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Shuqin Zheng
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Rice Research Institute, Sichuan Agricultural University, Chengdu 611130, China
| | - Jing Ning
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Rice Research Institute, Sichuan Agricultural University, Chengdu 611130, China
| | - Ruxian Luo
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Rice Research Institute, Sichuan Agricultural University, Chengdu 611130, China
| | - Xiangzheng Fu
- Research Institute of Hunan University in Chongqing, Chongqing 401120, China.
| | - Xiaoshu Deng
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Rice Research Institute, Sichuan Agricultural University, Chengdu 611130, China; Chongqing Academy of Chinese Materia Medica, Chongqing 400065, China.
| |
Collapse
|
4
|
Wang Y, Huang Y, Luo X, Lai X, Yu L, Zhao Z, Zhang A, Li H, Huang G, Li Y, Wang J, Wu Q. Deciphering the role of miRNA-134 in the pathophysiology of depression: A comprehensive review. Heliyon 2024; 10:e39026. [PMID: 39435111 PMCID: PMC11492588 DOI: 10.1016/j.heliyon.2024.e39026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2024] [Revised: 08/27/2024] [Accepted: 10/04/2024] [Indexed: 10/23/2024] Open
Abstract
This study summarizes the significance of microRNA-134 (miRNA-134) in the pathophysiology, diagnosis, and treatment of depression, a disease still under investigation due to its complexity. miRNA-134 is an endogenous short non-coding RNA that can bind to the 3' untranslated region (3'UTR) of miRNA-134, inhibiting gene translation and showing great potential in the regulation of mood, synaptic plasticity, and neuronal function. This study included 15 articles retrieved from four English-language databases: PubMed, Embase, The Cochrane Library, and Web of Science, and three Chinese literature databases: CNKI, Wanfang, and Chinese Science and Technology Periodical Database (VIP).We evaluated each of the 15 articles using the Critical Appraisal Skills Program (CASP) tool.The standard integrates analyzes of genomic, transcriptomic, neuroimaging, and behavioral data analyses related to miRNA-134 and depression. A multidimensional framework based on standardized criteria was used for quality assessment. The main findings indicate that miRNA-134 significantly affects synaptic plasticity and neurotransmitter regulation, in particular the synthesis and release of serotonin and dopamine. miRNA-134 shows high sensitivity and specificity as a biomarker for the diagnosis of depression and has therapeutic potential for the targeted treatment of depression. miRNA-134 plays a crucial role in the pathogenesis of depression, providing valuable insights for early diagnosis and the development of targeted therapeutic strategies. This work highlights the potential of miRNA-134 as a focal point for advancing personalized medicine approaches for depression.
Collapse
Affiliation(s)
- Yunkai Wang
- Faculty of Chinese Medicine, Macau University of Science and Technology, Avenida Wai Long, Taipa, Macau SAR, China
| | - Yali Huang
- Faculty of Chinese Medicine, Macau University of Science and Technology, Avenida Wai Long, Taipa, Macau SAR, China
| | - Xuexing Luo
- Faculty of Humanities and Arts, Macau University of Science and Technology, Avenida Wai Long, Taipa, Macau SAR, China
| | - Xin Lai
- Department of Traditional Chinese Medicine, The Sixth Affiliated Hospital, Sun Yat-sen University, Guangdong Province, Guangzhou, 510655, China
| | - Lili Yu
- State Key Laboratory of Quality Research in Chinese Medicines, Macau University of Science and Technology, Avenida Wai Long, Taipa, Macau SAR, China
- Faculty of Chinese Medicine, Macau University of Science and Technology, Avenida Wai Long, Taipa, Macau SAR, China
| | - Ziming Zhao
- State Key Laboratory of Quality Research in Chinese Medicines, Macau University of Science and Technology, Avenida Wai Long, Taipa, Macau SAR, China
- Faculty of Chinese Medicine, Macau University of Science and Technology, Avenida Wai Long, Taipa, Macau SAR, China
| | - Aijia Zhang
- Faculty of Humanities and Arts, Macau University of Science and Technology, Avenida Wai Long, Taipa, Macau SAR, China
| | - Hong Li
- Faculty of Humanities and Arts, Macau University of Science and Technology, Avenida Wai Long, Taipa, Macau SAR, China
| | - Guanghui Huang
- Faculty of Humanities and Arts, Macau University of Science and Technology, Avenida Wai Long, Taipa, Macau SAR, China
| | - Yu Li
- State Key Laboratory of Quality Research in Chinese Medicines, Macau University of Science and Technology, Avenida Wai Long, Taipa, Macau SAR, China
- Faculty of Chinese Medicine, Macau University of Science and Technology, Avenida Wai Long, Taipa, Macau SAR, China
| | - Jue Wang
- State Key Laboratory of Quality Research in Chinese Medicines, Macau University of Science and Technology, Avenida Wai Long, Taipa, Macau SAR, China
- Faculty of Chinese Medicine, Macau University of Science and Technology, Avenida Wai Long, Taipa, Macau SAR, China
- Guangdong-Hong Kong-Macao Joint Laboratory for Contaminants Exposure and Health, Guangzhou, Guangdong Province, China
| | - Qibiao Wu
- State Key Laboratory of Quality Research in Chinese Medicines, Macau University of Science and Technology, Avenida Wai Long, Taipa, Macau SAR, China
- Faculty of Chinese Medicine, Macau University of Science and Technology, Avenida Wai Long, Taipa, Macau SAR, China
- Guangdong-Hong Kong-Macao Joint Laboratory for Contaminants Exposure and Health, Guangzhou, Guangdong Province, China
| |
Collapse
|
5
|
Liu L, Huang Y, Zheng Y, Liao Y, Ma S, Wang Q. ScnML models single-cell transcriptome to predict spinal cord neuronal cell status. Front Genet 2024; 15:1413484. [PMID: 38894722 PMCID: PMC11183327 DOI: 10.3389/fgene.2024.1413484] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2024] [Accepted: 05/20/2024] [Indexed: 06/21/2024] Open
Abstract
Injuries to the spinal cord nervous system often result in permanent loss of sensory, motor, and autonomic functions. Accurately identifying the cellular state of spinal cord nerves is extremely important and could facilitate the development of new therapeutic and rehabilitative strategies. Existing experimental techniques for identifying the development of spinal cord nerves are both labor-intensive and costly. In this study, we developed a machine learning predictor, ScnML, for predicting subpopulations of spinal cord nerve cells as well as identifying marker genes. The prediction performance of ScnML was evaluated on the training dataset with an accuracy of 94.33%. Based on XGBoost, ScnML on the test dataset achieved 94.08% 94.24%, 94.26%, and 94.24% accuracies with precision, recall, and F1-measure scores, respectively. Importantly, ScnML identified new significant genes through model interpretation and biological landscape analysis. ScnML can be a powerful tool for predicting the status of spinal cord neuronal cells, revealing potential specific biomarkers quickly and efficiently, and providing crucial insights for precision medicine and rehabilitation recovery.
Collapse
Affiliation(s)
- Lijia Liu
- School of Recreation and Community Sport, Capital University of Physical Education and Sports, Beijing, China
| | - Yuxuan Huang
- Department of Neuroscience in the Behavioral Sciences, Duke University and Duke Kunshan University, Suzhou, Jiangsu, China
| | - Yuan Zheng
- Taizhou Hospital of Zhejiang Province, Wenzhou Medical University, Luqiao, China
| | - Yihan Liao
- Taizhou Hospital of Zhejiang Province, Wenzhou Medical University, Luqiao, China
| | - Siyuan Ma
- School of Recreation and Community Sport, Capital University of Physical Education and Sports, Beijing, China
| | - Qian Wang
- Department of Neurology, The First Hospital of Tsinghua University, Beijing, China
| |
Collapse
|
6
|
Singh J, Khanna NN, Rout RK, Singh N, Laird JR, Singh IM, Kalra MK, Mantella LE, Johri AM, Isenovic ER, Fouda MM, Saba L, Fatemi M, Suri JS. GeneAI 3.0: powerful, novel, generalized hybrid and ensemble deep learning frameworks for miRNA species classification of stationary patterns from nucleotides. Sci Rep 2024; 14:7154. [PMID: 38531923 PMCID: PMC11344070 DOI: 10.1038/s41598-024-56786-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Accepted: 03/11/2024] [Indexed: 03/28/2024] Open
Abstract
Due to the intricate relationship between the small non-coding ribonucleic acid (miRNA) sequences, the classification of miRNA species, namely Human, Gorilla, Rat, and Mouse is challenging. Previous methods are not robust and accurate. In this study, we present AtheroPoint's GeneAI 3.0, a powerful, novel, and generalized method for extracting features from the fixed patterns of purines and pyrimidines in each miRNA sequence in ensemble paradigms in machine learning (EML) and convolutional neural network (CNN)-based deep learning (EDL) frameworks. GeneAI 3.0 utilized five conventional (Entropy, Dissimilarity, Energy, Homogeneity, and Contrast), and three contemporary (Shannon entropy, Hurst exponent, Fractal dimension) features, to generate a composite feature set from given miRNA sequences which were then passed into our ML and DL classification framework. A set of 11 new classifiers was designed consisting of 5 EML and 6 EDL for binary/multiclass classification. It was benchmarked against 9 solo ML (SML), 6 solo DL (SDL), 12 hybrid DL (HDL) models, resulting in a total of 11 + 27 = 38 models were designed. Four hypotheses were formulated and validated using explainable AI (XAI) as well as reliability/statistical tests. The order of the mean performance using accuracy (ACC)/area-under-the-curve (AUC) of the 24 DL classifiers was: EDL > HDL > SDL. The mean performance of EDL models with CNN layers was superior to that without CNN layers by 0.73%/0.92%. Mean performance of EML models was superior to SML models with improvements of ACC/AUC by 6.24%/6.46%. EDL models performed significantly better than EML models, with a mean increase in ACC/AUC of 7.09%/6.96%. The GeneAI 3.0 tool produced expected XAI feature plots, and the statistical tests showed significant p-values. Ensemble models with composite features are highly effective and generalized models for effectively classifying miRNA sequences.
Collapse
Affiliation(s)
- Jaskaran Singh
- Department of Computer Science, Graphic Era Deemed to be University, Dehradun, Uttarakhand, India
| | - Narendra N Khanna
- Department of Cardiology, Indraprastha APOLLO Hospitals, New Delhi, India
| | - Ranjeet K Rout
- Department of Computer Science and Engineering, NIT Srinagar, Hazratbal, Srinagar, India
| | - Narpinder Singh
- Department of Food Science, Graphic Era Deemed to be University, Dehradun, Uttarakhand, India
| | - John R Laird
- Heart and Vascular Institute, Adventist Health St. Helena, St Helena, CA, USA
| | - Inder M Singh
- Advanced Cardiac and Vascular Institute, Sacramento, CA, USA
| | - Mannudeep K Kalra
- Department of Radiology, Massachusetts General Hospital, Boston, MA, 02115, USA
| | - Laura E Mantella
- Department of Biomedical and Molecular Sciences, Queen's University, Kingston, ON, Canada
| | - Amer M Johri
- Department of Biomedical and Molecular Sciences, Queen's University, Kingston, ON, Canada
| | - Esma R Isenovic
- Laboratory for Molecular Genetics and Radiobiology, University of Belgrade, Belgrade, Serbia
| | - Mostafa M Fouda
- Department of Electrical and Computer Engineering, Idaho State University, Pocatello, ID, 83209, USA
| | - Luca Saba
- Department of Neurology, University of Cagliari, Cagliari, Italy
| | - Mostafa Fatemi
- Department of Physiology and Biomedical Engineering, Mayo Clinic, Rochester, MN, 55905, USA
| | - Jasjit S Suri
- Stroke Monitoring and Diagnostic Division, AtheroPoint LLC, Roseville, CA, 95661, USA.
| |
Collapse
|
7
|
Fu X, Chen Y, Tian S. DlncRNALoc: A discrete wavelet transform-based model for predicting lncRNA subcellular localization. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:20648-20667. [PMID: 38124569 DOI: 10.3934/mbe.2023913] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2023]
Abstract
The prediction of long non-coding RNA (lncRNA) subcellular localization is essential to the understanding of its function and involvement in cellular regulation. Traditional biological experimental methods are costly and time-consuming, making computational methods the preferred approach for predicting lncRNA subcellular localization (LSL). However, existing computational methods have limitations due to the structural characteristics of lncRNAs and the uneven distribution of data across subcellular compartments. We propose a discrete wavelet transform (DWT)-based model for predicting LSL, called DlncRNALoc. We construct a physicochemical property matrix of a 2-tuple bases based on lncRNA sequences, and we introduce a DWT lncRNA feature extraction method. We use the Synthetic Minority Over-sampling Technique (SMOTE) for oversampling and the local fisher discriminant analysis (LFDA) algorithm to optimize feature information. The optimized feature vectors are fed into support vector machine (SVM) to construct a predictive model. DlncRNALoc has been applied for a five-fold cross-validation on the three sets of benchmark datasets. Extensive experiments have demonstrated the superiority and effectiveness of the DlncRNALoc model in predicting LSL.
Collapse
Affiliation(s)
- Xiangzheng Fu
- Neher's Biophysics Laboratory for Innovative Drug Discovery, State Key Laboratory of Quality Research in Chinese Medicine, Macau Institute for Applied Research in Medicine and Health, Macau University of Science and Technology, Macao, China
- College of Information Science and Engineering, Hunan University, Changsha, Hunan, China
- Department of Basic Biology, Changsha Medical College, Changsha, Hunan, China
| | - Yifan Chen
- College of Information Science and Engineering, Hunan University, Changsha, Hunan, China
- Department of Basic Biology, Changsha Medical College, Changsha, Hunan, China
| | - Sha Tian
- Department of Internal Medicine, College of Integrated Chinese and Western Medicine, Hunan University of Chinese Medicine, Changsha, Hunan, China
| |
Collapse
|
8
|
Wang H, Lin YN, Yan S, Hong JP, Tan JR, Chen YQ, Cao YS, Fang W. NRTPredictor: identifying rice root cell state in single-cell RNA-seq via ensemble learning. PLANT METHODS 2023; 19:119. [PMID: 37925413 PMCID: PMC10625708 DOI: 10.1186/s13007-023-01092-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Accepted: 10/15/2023] [Indexed: 11/06/2023]
Abstract
BACKGROUND Single-cell RNA sequencing (scRNA-seq) measurements of gene expression show great promise for studying the cellular heterogeneity of rice roots. How precisely annotating cell identity is a major unresolved problem in plant scRNA-seq analysis due to the inherent high dimensionality and sparsity. RESULTS To address this challenge, we present NRTPredictor, an ensemble-learning system, to predict rice root cell stage and mine biomarkers through complete model interpretability. The performance of NRTPredictor was evaluated using a test dataset, with 98.01% accuracy and 95.45% recall. With the power of interpretability provided by NRTPredictor, our model recognizes 110 marker genes partially involved in phenylpropanoid biosynthesis. Expression patterns of rice root could be mapped by the above-mentioned candidate genes, showing the superiority of NRTPredictor. Integrated analysis of scRNA and bulk RNA-seq data revealed aberrant expression of Epidermis cell subpopulations in flooding, Pi, and salt stresses. CONCLUSION Taken together, our results demonstrate that NRTPredictor is a useful tool for automated prediction of rice root cell stage and provides a valuable resource for deciphering the rice root cellular heterogeneity and the molecular mechanisms of flooding, Pi, and salt stresses. Based on the proposed model, a free webserver has been established, which is available at https://www.cgris.net/nrtp .
Collapse
Affiliation(s)
- Hao Wang
- The Innovation Team of Crop Germplasm Resources Preservation and Information, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
| | - Yu-Nan Lin
- The Innovation Team of Crop Germplasm Resources Preservation and Information, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
| | - Shen Yan
- The Innovation Team of Crop Germplasm Resources Preservation and Information, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
| | - Jing-Peng Hong
- The Innovation Team of Crop Germplasm Resources Preservation and Information, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
| | - Jia-Rui Tan
- The Innovation Team of Crop Germplasm Resources Preservation and Information, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
| | - Yan-Qing Chen
- The Innovation Team of Crop Germplasm Resources Preservation and Information, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100081, China.
| | - Yong-Sheng Cao
- The Innovation Team of Crop Germplasm Resources Preservation and Information, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100081, China.
| | - Wei Fang
- The Innovation Team of Crop Germplasm Resources Preservation and Information, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100081, China.
| |
Collapse
|
9
|
Dunkel H, Wehrmann H, Jensen LR, Kuss AW, Simm S. MncR: Late Integration Machine Learning Model for Classification of ncRNA Classes Using Sequence and Structural Encoding. Int J Mol Sci 2023; 24:8884. [PMID: 37240230 PMCID: PMC10218863 DOI: 10.3390/ijms24108884] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Revised: 05/11/2023] [Accepted: 05/13/2023] [Indexed: 05/28/2023] Open
Abstract
Non-coding RNA (ncRNA) classes take over important housekeeping and regulatory functions and are quite heterogeneous in terms of length, sequence conservation and secondary structure. High-throughput sequencing reveals that the expressed novel ncRNAs and their classification are important to understand cell regulation and identify potential diagnostic and therapeutic biomarkers. To improve the classification of ncRNAs, we investigated different approaches of utilizing primary sequences and secondary structures as well as the late integration of both using machine learning models, including different neural network architectures. As input, we used the newest version of RNAcentral, focusing on six ncRNA classes, including lncRNA, rRNA, tRNA, miRNA, snRNA and snoRNA. The late integration of graph-encoded structural features and primary sequences in our MncR classifier achieved an overall accuracy of >97%, which could not be increased by more fine-grained subclassification. In comparison to the actual best-performing tool ncRDense, we had a minimal increase of 0.5% in all four overlapping ncRNA classes on a similar test set of sequences. In summary, MncR is not only more accurate than current ncRNA prediction tools but also allows the prediction of long ncRNA classes (lncRNAs, certain rRNAs) up to 12.000 nts and is trained on a more diverse ncRNA dataset retrieved from RNAcentral.
Collapse
Affiliation(s)
- Heiko Dunkel
- Institute of Bioinformatics, University Medicine Greifswald, Walther-Rathenau Str. 48, 17489 Greifswald, Germany
| | - Henning Wehrmann
- Department of Biosciences, Molecular Cell Biology of Plants, Goethe University, 60438 Frankfurt am Main, Germany
| | - Lars R. Jensen
- Human Molecular Genetics Group, Department of Functional Genomics, Interfaculty Institute of Genetics and Functional Genomics, University Medicine Greifswald, 17475 Greifswald, Germany
| | - Andreas W. Kuss
- Human Molecular Genetics Group, Department of Functional Genomics, Interfaculty Institute of Genetics and Functional Genomics, University Medicine Greifswald, 17475 Greifswald, Germany
| | - Stefan Simm
- Institute of Bioinformatics, University Medicine Greifswald, Walther-Rathenau Str. 48, 17489 Greifswald, Germany
| |
Collapse
|
10
|
Garg P, Jamal F, Srivastava P. Deciphering the role of precursor miR-12136 and miR-8485 in the progression of intellectual disability (ID). IBRO Neurosci Rep 2022; 13:393-401. [PMID: 36345471 PMCID: PMC9636553 DOI: 10.1016/j.ibneur.2022.10.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Accepted: 10/15/2022] [Indexed: 11/06/2022] Open
Abstract
The short, non-coding RNAs known as miRNA modulate the expression of human protein-coding genes. About 90 % of genes in humans are controlled by the expression of miRNA. The dysfunction of these miRNA target genes leads to many human diseases, including neurodevelopmental disorders as well. Intellectual disability (ID) is a neurodevelopmental disorder that is characterized by adaptive behavior and intellectual functioning which includes logical reasoning, ability in learning, practical intelligence, and verbal skills. Identification of miRNA involved in ID and their associated target genes can help in the identification of diagnostic biomarkers related to ID at a very early age. The present study is an attempt to identify miRNA and their associated target genes that play an important role in the development of intellectual disability patients through the meta-analysis of available transcriptome data. A total of 6 transcriptomic studies were retrieved from NCBI and were subjected to quality check and trimming before alignment. The normalization and identification of differentially expressed miRNA were carried out using the EdgeR package of R studio. Further, the gene targets of downregulated miRNA were identified using miRDB. The system biology approaches were also applied to the study to identify the hub target genes and the diseases associated with main miRNAs.
Collapse
Affiliation(s)
- Prekshi Garg
- Amity Institute of Biotechnology, Amity University Uttar Pradesh, Lucknow Campus, 226028, India
| | - Farrukh Jamal
- Department of Biochemistry, Dr. Rammanohar Lohia Avadh University, Ayodhya 224001, U.P., India
| | - Prachi Srivastava
- Amity Institute of Biotechnology, Amity University Uttar Pradesh, Lucknow Campus, 226028, India
| |
Collapse
|
11
|
Savadi S, Muralidhara BM, Godwin J, Adiga JD, Mohana GS, Eradasappa E, Shamsudheen M, Karun A. De novo assembly and characterization of the draft genome of the cashew (Anacardium occidentale L.). Sci Rep 2022; 12:18187. [PMID: 36307541 PMCID: PMC9616956 DOI: 10.1038/s41598-022-22600-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2021] [Accepted: 10/17/2022] [Indexed: 12/31/2022] Open
Abstract
Cashew is the second most important tree nut crop in the global market. Cashew is a diploid and heterozygous species closely related to the mango and pistachio. Its improvement by conventional breeding is slow due to the long juvenile phase. Despite the economic importance, very little genomics/transcriptomics information is available for cashew. In this study, the Oxford nanopore reads and Illumina reads were used for de novo assembly of the cashew genome. The hybrid assembly yielded a 356.6 Mb genome corresponding to 85% of the estimated genome size (419 Mb). The BUSCO analysis showed 91.8% of genome completeness. Transcriptome mapping showed 92.75% transcripts aligned with the assembled genome. Gene predictions resulted in the identification of 31,263 genes coding for a total of 35,000 gene isoforms. About 46% (165 Mb) of the cashew genome comprised of repetitive sequences. Phylogenetic analyses of the cashew with nine species showed that it was closely related to Mangifera indica. Analysis of cashew genome revealed 3104 putative R-genes. The first draft assembly of the genome, transcriptome and R gene information generated in this study would be the foundation for understanding the molecular basis of economic traits and genomics-assisted breeding in cashew.
Collapse
Affiliation(s)
- Siddanna Savadi
- grid.505948.50000 0004 1764 470XICAR- Directorate of Cashew Research (DCR), Puttur, D.K., Karnataka 574 202 India
| | - B. M. Muralidhara
- grid.505948.50000 0004 1764 470XICAR- Directorate of Cashew Research (DCR), Puttur, D.K., Karnataka 574 202 India
| | - Jeffrey Godwin
- Bionivid Technology Private Limited, 209, 4th Cross Rd, B Channasandra, Kasturi Nagar, Bengaluru, Karnataka 560 043 India
| | - J. D. Adiga
- grid.505948.50000 0004 1764 470XICAR- Directorate of Cashew Research (DCR), Puttur, D.K., Karnataka 574 202 India
| | - G. S. Mohana
- grid.505948.50000 0004 1764 470XICAR- Directorate of Cashew Research (DCR), Puttur, D.K., Karnataka 574 202 India
| | - E. Eradasappa
- grid.505948.50000 0004 1764 470XICAR- Directorate of Cashew Research (DCR), Puttur, D.K., Karnataka 574 202 India
| | - M. Shamsudheen
- grid.505948.50000 0004 1764 470XICAR- Directorate of Cashew Research (DCR), Puttur, D.K., Karnataka 574 202 India
| | - Anitha Karun
- grid.505948.50000 0004 1764 470XICAR- Directorate of Cashew Research (DCR), Puttur, D.K., Karnataka 574 202 India
| |
Collapse
|
12
|
Hasan MM, Murtaz SB, Islam MU, Sadeq MJ, Uddin J. Robust and efficient COVID-19 detection techniques: A machine learning approach. PLoS One 2022; 17:e0274538. [PMID: 36107971 PMCID: PMC9477266 DOI: 10.1371/journal.pone.0274538] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2022] [Accepted: 08/30/2022] [Indexed: 12/02/2022] Open
Abstract
The devastating impact of the Severe Acute Respiratory Syndrome-Coronavirus 2 (SARS-CoV-2) pandemic almost halted the global economy and is responsible for 6 million deaths with infection rates of over 524 million. With significant reservations, initially, the SARS-CoV-2 virus was suspected to be infected by and closely related to Bats. However, over the periods of learning and critical development of experimental evidence, it is found to have some similarities with several gene clusters and virus proteins identified in animal-human transmission. Despite this substantial evidence and learnings, there is limited exploration regarding the SARS-CoV-2 genome to putative microRNAs (miRNAs) in the virus life cycle. In this context, this paper presents a detection method of SARS-CoV-2 precursor-miRNAs (pre-miRNAs) that helps to identify a quick detection of specific ribonucleic acid (RNAs). The approach employs an artificial neural network and proposes a model that estimated accuracy of 98.24%. The sampling technique includes a random selection of highly unbalanced datasets for reducing class imbalance following the application of matriculation artificial neural network that includes accuracy curve, loss curve, and confusion matrix. The classical approach to machine learning is then compared with the model and its performance. The proposed approach would be beneficial in identifying the target regions of RNA and better recognising of SARS-CoV-2 genome sequence to design oligonucleotide-based drugs against the genetic structure of the virus.
Collapse
Affiliation(s)
- Md. Mahadi Hasan
- Department of Computer Science and Engineering, Asian University of Bangladesh, Ashulia, Dhaka, Bangladesh
| | - Saba Binte Murtaz
- Department of Computer Science and Engineering, Asian University of Bangladesh, Ashulia, Dhaka, Bangladesh
| | - Muhammad Usama Islam
- School of Computing and Informatics, University of Louisiana at Lafayette, Lafayette, Louisiana, United States of America
| | - Muhammad Jafar Sadeq
- Department of Computer Science and Engineering, Asian University of Bangladesh, Ashulia, Dhaka, Bangladesh
| | - Jasim Uddin
- Department of Applied Computing and Engineering, Cardiff School of Technologies, Cardiff Metropolitan University, Cardiff, Wales, United Kingdom
- * E-mail:
| |
Collapse
|
13
|
Bonidia RP, Santos APA, de Almeida BLS, Stadler PF, da Rocha UN, Sanches DS, de Carvalho ACPLF. BioAutoML: automated feature engineering and metalearning to predict noncoding RNAs in bacteria. Brief Bioinform 2022; 23:6618238. [PMID: 35753697 PMCID: PMC9294424 DOI: 10.1093/bib/bbac218] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2022] [Revised: 05/06/2022] [Accepted: 05/09/2022] [Indexed: 01/19/2023] Open
Abstract
Recent technological advances have led to an exponential expansion of biological sequence data and extraction of meaningful information through Machine Learning (ML) algorithms. This knowledge has improved the understanding of mechanisms related to several fatal diseases, e.g. Cancer and coronavirus disease 2019, helping to develop innovative solutions, such as CRISPR-based gene editing, coronavirus vaccine and precision medicine. These advances benefit our society and economy, directly impacting people’s lives in various areas, such as health care, drug discovery, forensic analysis and food processing. Nevertheless, ML-based approaches to biological data require representative, quantitative and informative features. Many ML algorithms can handle only numerical data, and therefore sequences need to be translated into a numerical feature vector. This process, known as feature extraction, is a fundamental step for developing high-quality ML-based models in bioinformatics, by allowing the feature engineering stage, with design and selection of suitable features. Feature engineering, ML algorithm selection and hyperparameter tuning are often manual and time-consuming processes, requiring extensive domain knowledge. To deal with this problem, we present a new package: BioAutoML. BioAutoML automatically runs an end-to-end ML pipeline, extracting numerical and informative features from biological sequence databases, using the MathFeature package, and automating the feature selection, ML algorithm(s) recommendation and tuning of the selected algorithm(s) hyperparameters, using Automated ML (AutoML). BioAutoML has two components, divided into four modules: (1) automated feature engineering (feature extraction and selection modules) and (2) Metalearning (algorithm recommendation and hyper-parameter tuning modules). We experimentally evaluate BioAutoML in two different scenarios: (i) prediction of the three main classes of noncoding RNAs (ncRNAs) and (ii) prediction of the eight categories of ncRNAs in bacteria, including housekeeping and regulatory types. To assess BioAutoML predictive performance, it is experimentally compared with two other AutoML tools (RECIPE and TPOT). According to the experimental results, BioAutoML can accelerate new studies, reducing the cost of feature engineering processing and either keeping or improving predictive performance. BioAutoML is freely available at https://github.com/Bonidia/BioAutoML.
Collapse
Affiliation(s)
- Robson P Bonidia
- Institute of Mathematics and Computer Sciences, University of São Paulo, São Carlos 13566-590, Brazil
| | - Anderson P Avila Santos
- Institute of Mathematics and Computer Sciences, University of São Paulo, São Carlos 13566-590, Brazil.,Department of Environmental Microbiology, Helmholtz Centre for Environmental Research-UFZ GmbH, Leipzig, Saxony, Germany
| | - Breno L S de Almeida
- Institute of Mathematics and Computer Sciences, University of São Paulo, São Carlos 13566-590, Brazil
| | - Peter F Stadler
- Department of Computer Science and Interdisciplinary Center of Bioinformatics, University of Leipzig, Leipzig, Saxony, Germany
| | - Ulisses N da Rocha
- Department of Environmental Microbiology, Helmholtz Centre for Environmental Research-UFZ GmbH, Leipzig, Saxony, Germany
| | - Danilo S Sanches
- Department of Computer Science, Federal University of Technology - Paraná, UTFPR, Cornélio Procópio 86300-000, Brazil
| | - André C P L F de Carvalho
- Institute of Mathematics and Computer Sciences, University of São Paulo, São Carlos 13566-590, Brazil
| |
Collapse
|
14
|
CNNLSTMac4CPred: A Hybrid Model for N4-Acetylcytidine Prediction. Interdiscip Sci 2022; 14:439-451. [PMID: 35106702 DOI: 10.1007/s12539-021-00500-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2021] [Revised: 12/04/2021] [Accepted: 12/13/2021] [Indexed: 12/23/2022]
Abstract
N4-Acetylcytidine (ac4C) is a highly conserved post-transcriptional and an extensively existing RNA modification, playing versatile roles in the cellular processes. Due to the limitation of techniques and knowledge, large-scale identification of ac4C is still a challenging task. RNA sequences are like sentences containing semantics in the natural language. Inspired by the semantics of language, we proposed a hybrid model for ac4C prediction. The model used long short-term memory and convolution neural network to extract the semantic features hidden in the sequences. The semantic and the two traditional features (k-nucleotide frequencies and pseudo tri-tuple nucleotide composition) were combined to represent ac4C or non-ac4C sequences. The eXtreme Gradient Boosting was used as the learning algorithm. Five-fold cross-validation over the training set consisting of 1160 ac4C and 10,855 non-ac4C sequences obtained the area under the receiver operating characteristic curve (AUROC) of 0.9004, and the independent test over 469 ac4C and 4343 non-ac4C sequences reached an AUROC of 0.8825. The model obtained a sensitivity of 0.6474 in the five-fold cross-validation and 0.6290 in the independent test, outperforming two state-of-the-art methods. The performance of semantic features alone was better than those of k-nucleotide frequencies and pseudo tri-tuple nucleotide composition, implying that ac4C sequences are of semantics. The proposed hybrid model was implemented into a user-friendly web-server which is freely available to scientific communities: http://47.113.117.61/ac4c/ . The presented model and tool are beneficial to identify ac4C on large scale.
Collapse
|
15
|
Chen Y, Li Z, Li Z. Prediction of Plant Resistance Proteins Based on Pairwise Energy Content and Stacking Framework. FRONTIERS IN PLANT SCIENCE 2022; 13:912599. [PMID: 35712582 PMCID: PMC9194944 DOI: 10.3389/fpls.2022.912599] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/05/2022] [Accepted: 05/10/2022] [Indexed: 06/15/2023]
Abstract
Plant resistance proteins (R proteins) recognize effector proteins secreted by pathogenic microorganisms and trigger an immune response against pathogenic microbial infestation. Accurate identification of plant R proteins is an important research topic in plant pathology. Plant R protein prediction has achieved many research results. Recently, some machine learning-based methods have emerged to identify plant R proteins. Still, most of them only rely on protein sequence features, which ignore inter-amino acid features, thus limiting the further improvement of plant R protein prediction performance. In this manuscript, we propose a method called StackRPred to predict plant R proteins. Specifically, the StackRPred first obtains plant R protein feature information from the pairwise energy content of residues; then, the obtained feature information is fed into the stacking framework for training to construct a prediction model for plant R proteins. The results of both the five-fold cross-validation and independent test validation show that our proposed method outperforms other state-of-the-art methods, indicating that StackRPred is an effective tool for predicting plant R proteins. It is expected to bring some favorable contribution to the study of plant R proteins.
Collapse
Affiliation(s)
- Yifan Chen
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Zejun Li
- School of Computer Science and Technology, Hunan Institute of Technology, Hengyang, China
| | - Zhiyong Li
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| |
Collapse
|
16
|
Construction and characterization of a de novo draft genome of garden cress (Lepidium sativum L.). Funct Integr Genomics 2022; 22:879-889. [PMID: 35596045 DOI: 10.1007/s10142-022-00866-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2022] [Accepted: 05/11/2022] [Indexed: 11/04/2022]
Abstract
Garden cress (Lepidium sativum L.) is a Brassicaceae crop recognized as a healthy vegetable and a medicinal plant. Lepidium is one of the largest genera in Brassicaceae, yet, the genus has not been a focus of extensive genomic research. In the present work, garden cress genome was sequenced using the long read high-fidelity sequencing technology. A de novo, draft genome assembly that spans 336.5 Mb was produced, corresponding to 88.6% of the estimated genome size and representing 90% of the evolutionarily expected orthologous gene content. Protein coding gene content was structurally predicted and functionally annotated, resulting in the identification of 25,668 putative genes. A total of 599 candidate disease resistance genes were identified by predicting resistance gene domains in gene structures, and 37 genes were detected as orthologs of heavy metal associated protein coding genes. In addition, 4289 genes were assigned as "transcription factor coding." Six different machine learning algorithms were trained and tested for their performance in classifying miRNA coding genomic sequences. Logistic regression proved the best performing trained algorithm, thus utilized for pre-miRNA coding loci identification in the assembly. Repetitive DNA analysis involved the characterization of transposable element and microsatellite contents. L. sativum chloroplast genome was also assembled and functionally annotated. Data produced in the present work is expected to constitute a foundation for genomic research in garden cress and contribute to genomics-assisted crop improvement and genome evolution studies in the Brassicaceae family.
Collapse
|
17
|
MicroRNAs (miRNAs) in Cardiovascular Complications of Rheumatoid Arthritis (RA): What Is New? Int J Mol Sci 2022; 23:ijms23095254. [PMID: 35563643 PMCID: PMC9101033 DOI: 10.3390/ijms23095254] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2022] [Revised: 05/04/2022] [Accepted: 05/06/2022] [Indexed: 02/08/2023] Open
Abstract
Rheumatoid Arthritis (RA) is among the most prevalent and impactful rheumatologic chronic autoimmune diseases (AIDs) worldwide. Within a framework that recognizes both immunological activation and inflammatory pathways, the exact cause of RA remains unclear. It seems however, that RA is initiated by a combination between genetic susceptibility, and environmental triggers, which result in an auto-perpetuating process. The subsequently, systemic inflammation associated with RA is linked with a variety of extra-articular comorbidities, including cardiovascular disease (CVD), resulting in increased mortality and morbidity. Hitherto, vast evidence demonstrated the key role of non-coding RNAs such as microRNAs (miRNAs) in RA, and in RA-CVD related complications. In this descriptive review, we aim to highlight the specific role of miRNAs in autoimmune processes, explicitly on their regulatory roles in the pathogenesis of RA, and its CV consequences, their main role as novel biomarkers, and their possible role as therapeutic targets.
Collapse
|
18
|
Cai L, Gao M, Ren X, Fu X, Xu J, Wang P, Chen Y. MILNP: Plant lncRNA-miRNA Interaction Prediction Based on Improved Linear Neighborhood Similarity and Label Propagation. FRONTIERS IN PLANT SCIENCE 2022; 13:861886. [PMID: 35401586 PMCID: PMC8990282 DOI: 10.3389/fpls.2022.861886] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/25/2022] [Accepted: 02/21/2022] [Indexed: 06/14/2023]
Abstract
Knowledge of the interactions between long non-coding RNAs (lncRNAs) and microRNAs (miRNAs) is the basis of understanding various biological activities and designing new drugs. Previous computational methods for predicting lncRNA-miRNA interactions lacked for plants, and they suffer from various limitations that affect the prediction accuracy and their applicability. Research on plant lncRNA-miRNA interactions is still in its infancy. In this paper, we propose an accurate predictor, MILNP, for predicting plant lncRNA-miRNA interactions based on improved linear neighborhood similarity measurement and linear neighborhood propagation algorithm. Specifically, we propose a novel similarity measure based on linear neighborhood similarity from multiple similarity profiles of lncRNAs and miRNAs and derive more precise neighborhood ranges so as to escape the limits of the existing methods. We then simultaneously update the lncRNA-miRNA interactions predicted from both similarity matrices based on label propagation. We comprehensively evaluate MILNP on the latest plant lncRNA-miRNA interaction benchmark datasets. The results demonstrate the superior performance of MILNP than the most up-to-date methods. What's more, MILNP can be leveraged for isolated plant lncRNAs (or miRNAs). Case studies suggest that MILNP can identify novel plant lncRNA-miRNA interactions, which are confirmed by classical tools. The implementation is available on https://github.com/HerSwain/gra/tree/MILNP.
Collapse
Affiliation(s)
| | | | | | - Xiangzheng Fu
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | | | - Peng Wang
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | | |
Collapse
|
19
|
HKAM-MKM: A hybrid kernel alignment maximization-based multiple kernel model for identifying DNA-binding proteins. Comput Biol Med 2022; 145:105395. [PMID: 35334314 DOI: 10.1016/j.compbiomed.2022.105395] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2022] [Revised: 03/08/2022] [Accepted: 03/08/2022] [Indexed: 12/24/2022]
Abstract
The identification of DNA-binding proteins (DBPs) has always been a hot issue in the field of sequence classification. However, considering that the experimental identification method is very resource-intensive, the construction of a computational prediction model is worthwhile. This study developed and evaluated a hybrid kernel alignment maximization-based multiple kernel model (HKAM-MKM) for predicting DBPs. First, we collected two datasets and performed feature extraction on the sequences to obtain six feature groups, and then constructed the corresponding kernels. To ensure the effective utilisation of the base kernel and avoid ignoring the difference between the sample and its neighbours, we proposed local kernel alignment to calculate the kernel between the sample and its neighbours, with each sample as the centre. We combined the global and local kernel alignments to develop a hybrid kernel alignment model, and balance the relationship between the two through parameters. By maximising the hybrid kernel alignment value, we obtained the weight of each kernel and then linearly combined the kernels in the form of weights. Finally, the fused kernel was input into a support vector machine for training and prediction. Finally, in the independent test sets PDB186 and PDB2272, we obtained the highest Matthew's correlation coefficient (MCC) (0.768 and 0.5962, respectively) and the highest accuracy (87.1% and 78.43%, respectively), which were superior to the other predictors. Therefore, HKAM-MKM is an efficient prediction tool for DBPs.
Collapse
|
20
|
Das M, Hasan M, Akter S, Roy S, Sharma B, Chowdhury MSR, Ahsan MI, Akhand RN, Uddin MB, Ahmed SSU. In Silico Investigation of Conserved miRNAs and Their Targets From the Expressed Sequence Tags in Neospora Caninum Genome. Bioinform Biol Insights 2021; 15:11779322211046729. [PMID: 34898982 PMCID: PMC8655437 DOI: 10.1177/11779322211046729] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2021] [Accepted: 08/20/2021] [Indexed: 12/02/2022] Open
Abstract
Neospora caninum is a protozoan parasite, the etiologic agent of Neosporosis—a common cause of abortion in cattle worldwide. Herd level prevalence of Neosporosis could be as high as 90%. However, there is no approved treatment and vaccines available for Neosporosis. MicroRNA (miRNA) based prophylaxis and therapeutics could be options for Neosporosis in cattle and other animals. The current study aimed to investigate the genome of Neospora caninum to identify and characterize the conserved miRNAs through Expressed Sequence Tags (ESTs) dependent homology search. A total of 1,041 mature miRNAs of reference organisms were employed against 336 non-redundant ESTs available in the genome of Neospora caninum. The study predicted one putative miRNA “nca-miR-9388-5p” of 19 nucleotides with MFEI value -1.51 kcal/mol and (A + U) content% 72.94% corresponding with its pre-miRNA. A comprehensive search for specific gene targets was performed and discovered 16 potential genes associated with different protozoal physiological functions. Significantly, the gene “Protein phosphatase” was found responsible for the virulence of Neospora caninum. The other genes were accounted for gene expression, vesicular transport, cell signaling, cell proliferation, DNA repair mechanism, and different developmental stages of the protozoon. Therefore, this study finding will provide pivotal information to future aspirants upon Bovine Neosporosis. It will also serve as the baseline information for further studies of the bioinformatics approach to identify other protozoal miRNAs.
Collapse
Affiliation(s)
- Moumita Das
- Department of Epidemiology and Public Health, Sylhet Agricultural University, Sylhet, Bangladesh
| | - Mahmudul Hasan
- Department of Pharmaceuticals and Industrial Biotechnology, Sylhet Agricultural University, Sylhet, Bangladesh
| | - Sharmin Akter
- Department of Epidemiology and Public Health, Sylhet Agricultural University, Sylhet, Bangladesh
| | - Sawrab Roy
- Department of Microbiology and Immunology, Sylhet Agricultural University, Sylhet, Bangladesh
| | - Binayok Sharma
- Department of Medicine, Sylhet Agricultural University, Sylhet, Bangladesh
| | | | - Md Irtija Ahsan
- Department of Epidemiology and Public Health, Sylhet Agricultural University, Sylhet, Bangladesh
| | - Rubaiat Nazneen Akhand
- Department of Biochemistry and Chemistry, Sylhet Agricultural University, Sylhet, Bangladesh
| | - Md Bashir Uddin
- Department of Medicine, Sylhet Agricultural University, Sylhet, Bangladesh
| | - Syed Sayeem Uddin Ahmed
- Department of Epidemiology and Public Health, Sylhet Agricultural University, Sylhet, Bangladesh
| |
Collapse
|
21
|
Li DY, Lin FF, Li GP, Zeng FC. Exosomal microRNA-15a from ACHN cells aggravates clear cell renal cell carcinoma via the BTG2/PI3K/AKT axis. Kaohsiung J Med Sci 2021; 37:973-982. [PMID: 34337864 DOI: 10.1002/kjm2.12428] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2021] [Revised: 06/01/2021] [Accepted: 06/09/2021] [Indexed: 12/18/2022] Open
Abstract
Accumulating studies have indicated that exosomal microRNAs (miRNAs/miRs) can mediate clear cell renal cell carcinoma (ccRCC) at the early stage, but the mechanisms remain to be specified. Here, we investigated the mechanism of exosomal miR-15a in ccRCC. After successful isolation of exosomes from RCC cells, we found that miR-15a was upregulated in ccRCC cells. Moreover, upregulation of miR-15a by pre-miR-15a promoted the proliferation, migration, invasion, and epithelial-mesenchymal transition of ccRCC cells. A luciferase assay revealed that B-cell translocation gene 2 (BTG2) was a target gene of miR-15a and negatively correlated with miR-15a expression. BTG2 was poorly expressed in ccRCC, which reduced the proliferation of ccRCC cells. In addition, overexpression of BTG2 could reverse the promotive effects of miR-15a on ccRCC. Furthermore, BTG2 reduced PI3K/AKT pathway activity. Our results collectively indicated that exosomal miR-15a from RCC cells accelerated cell viability by downregulating BTG2 and promoting the activity of the PI3K/AKT signaling pathway. We demonstrated a novel mechanism by which exosomal miR-15a exerted pro-proliferatory effects on ccRCC, highlighting the potential of exosomal miR-15a as a target for ccRCC prognosis.
Collapse
Affiliation(s)
- Dao-Yuan Li
- Department of Urology, Hainan General Hospital (Hainan Affiliated Hospital of Hainan Medical University), Haikou, China
| | - Fei-Fei Lin
- Department of Otorhinolaryngology - Head and Neck Surgery, Hainan General Hospital (Hainan Affiliated Hospital of Hainan Medical University), Haikou, China
| | - Guo-Ping Li
- Department of Urology, Hainan General Hospital (Hainan Affiliated Hospital of Hainan Medical University), Haikou, China
| | - Fan-Chang Zeng
- Department of Urology, Hainan General Hospital (Hainan Affiliated Hospital of Hainan Medical University), Haikou, China
| |
Collapse
|
22
|
Esposito S, Aversano R, Tripodi P, Carputo D. Whole-Genome Doubling Affects Pre-miRNA Expression in Plants. PLANTS 2021; 10:plants10051004. [PMID: 34069771 PMCID: PMC8157229 DOI: 10.3390/plants10051004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/13/2021] [Revised: 04/09/2021] [Accepted: 05/10/2021] [Indexed: 11/16/2022]
Abstract
Whole-genome doubling (polyploidy) is common in angiosperms. Several studies have indicated that it is often associated with molecular, physiological, and phenotypic changes. Mounting evidence has pointed out that micro-RNAs (miRNAs) may have an important role in whole-genome doubling. However, an integrative approach that compares miRNA expression in polyploids is still lacking. Here, a re-analysis of already published RNAseq datasets was performed to identify microRNAs’ precursors (pre-miRNAs) in diploids (2x) and tetraploids (4x) of five species (Arabidopsis thaliana L., Morus alba L., Brassica rapa L., Isatis indigotica Fort., and Solanum commersonii Dun). We found 3568 pre-miRNAs, three of which (pre-miR414, pre-miR5538, and pre-miR5141) were abundant in all 2x, and were absent/low in their 4x counterparts. They are predicted to target more than one mRNA transcript, many belonging to transcription factors (TFs), DNA repair mechanisms, and related to stress. Sixteen pre-miRNAs were found in common in all 2x and 4x. Among them, pre-miRNA482, pre-miRNA2916, and pre-miRNA167 changed their expression after polyploidization, being induced or repressed in 4x plants. Based on our results, a common ploidy-dependent response was triggered in all species under investigation, which involves DNA repair, ATP-synthesis, terpenoid biosynthesis, and several stress-responsive transcripts. In addition, an ad hoc pre-miRNA expression analysis carried out solely on 2x vs. 4x samples of S. commersonii indicated that ploidy-dependent pre-miRNAs seem to actively regulate the nucleotide metabolism, probably to cope with the increased requirement for DNA building blocks caused by the augmented DNA content. Overall, the results outline the critical role of microRNA-mediated responses following autopolyploidization in plants.
Collapse
Affiliation(s)
- Salvatore Esposito
- CREA Research Centre for Cereal and Industrial Crops, 71122 Foggia, Italy;
- Department of Agricultural Sciences, University of Naples Federico II, 80055 Portici, Italy;
| | - Riccardo Aversano
- Department of Agricultural Sciences, University of Naples Federico II, 80055 Portici, Italy;
| | - Pasquale Tripodi
- CREA Research Centre for Vegetable and Ornamental Crops, 84098 Pontecagnano, Italy;
| | - Domenico Carputo
- Department of Agricultural Sciences, University of Naples Federico II, 80055 Portici, Italy;
- Correspondence: ; Tel.: +39-08-1252-9225
| |
Collapse
|
23
|
Chen Y, Fu X, Li Z, Peng L, Zhuo L. Prediction of lncRNA-Protein Interactions via the Multiple Information Integration. Front Bioeng Biotechnol 2021; 9:647113. [PMID: 33718346 PMCID: PMC7947871 DOI: 10.3389/fbioe.2021.647113] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2020] [Accepted: 01/19/2021] [Indexed: 01/09/2023] Open
Abstract
The long non-coding RNA (lncRNA)-protein interaction plays an important role in the post-transcriptional gene regulation, such as RNA splicing, translation, signaling, and the development of complex diseases. The related research on the prediction of lncRNA-protein interaction relationship is beneficial in the excavation and the discovery of the mechanism of lncRNA function and action occurrence, which are important. Traditional experimental methods for detecting lncRNA-protein interactions are expensive and time-consuming. Therefore, computational methods provide many effective strategies to deal with this problem. In recent years, most computational methods only use the information of the lncRNA-lncRNA or the protein-protein similarity and cannot fully capture all features to identify their interactions. In this paper, we propose a novel computational model for the lncRNA-protein prediction on the basis of machine learning methods. First, a feature method is proposed for representing the information of the network topological properties of lncRNA and protein interactions. The basic composition feature information and evolutionary information based on protein, the lncRNA sequence feature information, and the lncRNA expression profile information are extracted. Finally, the above feature information is fused, and the optimized feature vector is used with the recursive feature elimination algorithm. The optimized feature vectors are input to the support vector machine (SVM) model. Experimental results show that the proposed method has good effectiveness and accuracy in the lncRNA-protein interaction prediction.
Collapse
Affiliation(s)
- Yifan Chen
- College of Information Science and Engineering, Hunan University, Changsha, China
- School of Computer and Information Science, Hunan Institute of Technology, Hengyang, China
| | - Xiangzheng Fu
- College of Information Science and Engineering, Hunan University, Changsha, China
| | - Zejun Li
- School of Computer and Information Science, Hunan Institute of Technology, Hengyang, China
| | - Li Peng
- College of Computer Science and Engineering, Hunan University of Science and Technology, Xiangtan, China
| | - Linlin Zhuo
- Department of Mathematics and Information Engineering, Wenzhou University Oujiang College, Wenzhou, China
| |
Collapse
|
24
|
De novo assembly and characterization of the first draft genome of quince (Cydonia oblonga Mill.). Sci Rep 2021; 11:3818. [PMID: 33589687 PMCID: PMC7884838 DOI: 10.1038/s41598-021-83113-3] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2020] [Accepted: 01/28/2021] [Indexed: 01/30/2023] Open
Abstract
Quince (Cydonia oblonga Mill.) is the sole member of the genus Cydonia in the Rosacea family and closely related to the major pome fruits, apple (Malus domestica Borkh.) and pear (Pyrus communis L.). In the present work, whole genome shotgun paired-end sequencing was employed in order to assemble the first draft genome of quince. A genome assembly that spans 488.4 Mb of sequence corresponding to 71.2% of the estimated genome size (686 Mb) was produced in the study. Gene predictions via ab initio and homology-based sequence annotation strategies resulted in the identification of 25,428 and 30,684 unique putative protein coding genes, respectively. 97.4 and 95.6% of putative homologs of Arabidopsis and rice transcription factors were identified in the ab initio predicted genic sequences. Different machine learning algorithms were tested for classifying pre-miRNA (precursor microRNA) coding sequences, identifying Support Vector Machine (SVM) as the best performing classifier. SVM classification predicted 600 putative pre-miRNA coding loci. Repetitive DNA content of the assembly was also characterized. The first draft assembly of the quince genome produced in this work would constitute a foundation for functional genomic research in quince toward dissecting the genetic basis of important traits and performing genomics-assisted breeding.
Collapse
|
25
|
Yarahmadi A, Shahrokhi SZ, Mostafavi-Pour Z, Azarpira N. MicroRNAs in diabetic nephropathy: From molecular mechanisms to new therapeutic targets of treatment. Biochem Pharmacol 2020; 189:114301. [PMID: 33203517 DOI: 10.1016/j.bcp.2020.114301] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2020] [Revised: 10/20/2020] [Accepted: 10/21/2020] [Indexed: 12/16/2022]
Abstract
Despite considerable investigation in diabetic nephropathy (DN) pathogenesis and possible treatments, current therapies still do not provide competent prevention from disease progression to end-stage renal disease (ESRD) in most patients. Therefore, investigating exact molecular mechanisms and important mediators underlying DN may help design better therapeutic approaches for proper treatment. MicroRNAs (MiRNAs) are a class of small non-coding RNAs that play a crucial role in post-transcriptional regulation of many gene expression within the cells and present an excellent opportunity for new therapeutic approaches because their profile is often changed during many diseases, including DN. This review discusses the most important signaling pathways involved in DN and changes in miRNAs profile in each signaling pathway. We also suggest possible approaches for miRNA derived interventions for designing better treatment of DN.
Collapse
Affiliation(s)
- Amir Yarahmadi
- Department of Clinical Biochemistry, Faculty of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran; Transplant Research Center, Shiraz University of Medical Sciences, Shiraz, Iran
| | - Seyedeh Zahra Shahrokhi
- Department of Laboratory Medicine, School of Allied Medical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Zohreh Mostafavi-Pour
- Department of Biochemistry, School of Medicine, Shiraz University of Medical Sciences, Shiraz, Iran.
| | - Negar Azarpira
- Transplant Research Center, Shiraz University of Medical Sciences, Shiraz, Iran.
| |
Collapse
|
26
|
Guan ZX, Li SH, Zhang ZM, Zhang D, Yang H, Ding H. A Brief Survey for MicroRNA Precursor Identification Using Machine Learning Methods. Curr Genomics 2020; 21:11-25. [PMID: 32655294 PMCID: PMC7324890 DOI: 10.2174/1389202921666200214125102] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2019] [Revised: 01/24/2020] [Accepted: 01/30/2020] [Indexed: 11/22/2022] Open
Abstract
MicroRNAs, a group of short non-coding RNA molecules, could regulate gene expression. Many diseases are associated with abnormal expression of miRNAs. Therefore, accurate identification of miRNA precursors is necessary. In the past 10 years, experimental methods, comparative genomics methods, and artificial intelligence methods have been used to identify pre-miRNAs. However, experimental methods and comparative genomics methods have their disadvantages, such as time-consuming. In contrast, machine learning-based method is a better choice. Therefore, the review summarizes the current advances in pre-miRNA recognition based on computational methods, including the construction of benchmark datasets, feature extraction methods, prediction algorithms, and the results of the models. And we also provide valid information about the predictors currently available. Finally, we give the future perspectives on the identification of pre-miRNAs. The review provides scholars with a whole background of pre-miRNA identification by using machine learning methods, which can help researchers have a clear understanding of progress of the research in this field.
Collapse
Affiliation(s)
- Zheng-Xing Guan
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu610054, China
| | - Shi-Hao Li
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu610054, China
| | - Zi-Mei Zhang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu610054, China
| | - Dan Zhang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu610054, China
| | - Hui Yang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu610054, China
| | - Hui Ding
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu610054, China
| |
Collapse
|
27
|
Liu B, Luo Z, He J. sgRNA-PSM: Predict sgRNAs On-Target Activity Based on Position-Specific Mismatch. MOLECULAR THERAPY. NUCLEIC ACIDS 2020; 20:323-330. [PMID: 32199128 PMCID: PMC7083770 DOI: 10.1016/j.omtn.2020.01.029] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/08/2019] [Revised: 12/21/2019] [Accepted: 01/23/2020] [Indexed: 12/26/2022]
Abstract
As a key technique for the CRISPR-Cas9 system, identification of single-guide RNAs (sgRNAs) on-target activity is critical for both theoretical research (investigation of RNA functions) and real-world applications (genome editing and synthetic biology). Because of its importance, several computational predictors have been proposed to predict sgRNAs on-target activity. All of these methods have clearly contributed to the developments of this very important field. However, they are suffering from certain limitations. We proposed two new methods called "sgRNA-PSM" and "sgRNA-ExPSM" for sgRNAs on-target activity prediction via capturing the long-range sequence information and evolutionary information using a new way to reduce the dimension of the feature vector to avoid the risk of overfitting. Rigorous leave-one-gene-out cross-validation on a benchmark dataset with 11 human genes and 6 mouse genes, as well as an independent dataset, indicated that the two new methods outperformed other competing methods. To make it easier for users to use the proposed sgRNA-PSM predictor, we have established a corresponding web server, which is available at http://bliulab.net/sgRNA-PSM/.
Collapse
Affiliation(s)
- Bin Liu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China; Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing, China.
| | - Zhihua Luo
- Affiliated Shenzhen Maternity & Child Healthcare Hospital, Southern Medical University, Shenzhen, Guangdong, China
| | - Juan He
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong, China
| |
Collapse
|