1
|
Avila Santos AP, de Almeida BLS, Bonidia RP, Stadler PF, Stefanic P, Mandic-Mulec I, Rocha U, Sanches DS, de Carvalho ACPLF. BioDeepfuse: a hybrid deep learning approach with integrated feature extraction techniques for enhanced non-coding RNA classification. RNA Biol 2024; 21:1-12. [PMID: 38528797 DOI: 10.1080/15476286.2024.2329451] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/23/2024] [Indexed: 03/27/2024] Open
Abstract
The accurate classification of non-coding RNA (ncRNA) sequences is pivotal for advanced non-coding genome annotation and analysis, a fundamental aspect of genomics that facilitates understanding of ncRNA functions and regulatory mechanisms in various biological processes. While traditional machine learning approaches have been employed for distinguishing ncRNA, these often necessitate extensive feature engineering. Recently, deep learning algorithms have provided advancements in ncRNA classification. This study presents BioDeepFuse, a hybrid deep learning framework integrating convolutional neural networks (CNN) or bidirectional long short-term memory (BiLSTM) networks with handcrafted features for enhanced accuracy. This framework employs a combination of k-mer one-hot, k-mer dictionary, and feature extraction techniques for input representation. Extracted features, when embedded into the deep network, enable optimal utilization of spatial and sequential nuances of ncRNA sequences. Using benchmark datasets and real-world RNA samples from bacterial organisms, we evaluated the performance of BioDeepFuse. Results exhibited high accuracy in ncRNA classification, underscoring the robustness of our tool in addressing complex ncRNA sequence data challenges. The effective melding of CNN or BiLSTM with external features heralds promising directions for future research, particularly in refining ncRNA classifiers and deepening insights into ncRNAs in cellular processes and disease manifestations. In addition to its original application in the context of bacterial organisms, the methodologies and techniques integrated into our framework can potentially render BioDeepFuse effective in various and broader domains.
Collapse
Affiliation(s)
- Anderson P Avila Santos
- Institute of Mathematics and Computer Sciences, University of São Paulo, São Carlos, Brazil
- Department of Applied Microbial Ecology, Helmholtz Centre for Environmental Research - UFZ GmbH, Leipzig, Saxony, Germany
| | - Breno L S de Almeida
- Institute of Mathematics and Computer Sciences, University of São Paulo, São Carlos, Brazil
| | - Robson P Bonidia
- Institute of Mathematics and Computer Sciences, University of São Paulo, São Carlos, Brazil
- Department of Computer Science, Federal University of Technology - Paraná, UTFPR, Cornélio Procópio, Brazil
| | - Peter F Stadler
- Department of Computer Science and Interdisciplinary Center of Bioinformatics, University of Leipzig, Leipzig, Saxony, Germany
| | - Polonca Stefanic
- Department of Food Science and Technology, Biotechnical Faculty, University of Ljubljana, Ljubljana, Slovenia
| | - Ines Mandic-Mulec
- Department of Food Science and Technology, Biotechnical Faculty, University of Ljubljana, Ljubljana, Slovenia
| | - Ulisses Rocha
- Department of Applied Microbial Ecology, Helmholtz Centre for Environmental Research - UFZ GmbH, Leipzig, Saxony, Germany
| | - Danilo S Sanches
- Department of Computer Science, Federal University of Technology - Paraná, UTFPR, Cornélio Procópio, Brazil
| | | |
Collapse
|
2
|
Teufel F, Gíslason MH, Almagro Armenteros JJ, Johansen A, Winther O, Nielsen H. GraphPart: homology partitioning for biological sequence analysis. NAR Genom Bioinform 2023; 5:lqad088. [PMID: 37850036 PMCID: PMC10578201 DOI: 10.1093/nargab/lqad088] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Revised: 08/25/2023] [Accepted: 09/19/2023] [Indexed: 10/19/2023] Open
Abstract
When splitting biological sequence data for the development and testing of predictive models, it is necessary to avoid too-closely related pairs of sequences ending up in different partitions. If this is ignored, performance of prediction methods will tend to be overestimated. Several algorithms have been proposed for homology reduction, where sequences are removed until no too-closely related pairs remain. We present GraphPart, an algorithm for homology partitioning that divides the data such that closely related sequences always end up in the same partition, while keeping as many sequences as possible in the dataset. Evaluation of GraphPart on Protein, DNA and RNA datasets shows that it is capable of retaining a larger number of sequences per dataset, while providing homology separation on a par with reduction approaches.
Collapse
Affiliation(s)
- Felix Teufel
- Department of Biology, University of Copenhagen, 2200 Copenhagen, Denmark
- Digital Science & Innovation, Novo Nordisk A/S, 2760 Måløv, Denmark
| | - Magnús Halldór Gíslason
- Department of Genomic Medicine, Copenhagen University Hospital/Rigshospitalet, 2100 Copenhagen, Denmark
| | - José Juan Almagro Armenteros
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
- Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA
| | | | - Ole Winther
- Department of Biology, University of Copenhagen, 2200 Copenhagen, Denmark
- Department of Genomic Medicine, Copenhagen University Hospital/Rigshospitalet, 2100 Copenhagen, Denmark
- Department of Applied Mathematics and Computer Science, Technical University of Denmark, 2800 Kgs. Lyngby, Denmark
| | - Henrik Nielsen
- Department of Health Technology, Technical University of Denmark, 2800 Kgs. Lyngby, Denmark
| |
Collapse
|
3
|
Teragawa S, Wang L. ConF: A Deep Learning Model Based on BiLSTM, CNN, and Cross Multi-Head Attention Mechanism for Noncoding RNA Family Prediction. Biomolecules 2023; 13:1643. [PMID: 38002325 PMCID: PMC10669714 DOI: 10.3390/biom13111643] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2023] [Revised: 10/21/2023] [Accepted: 10/24/2023] [Indexed: 11/26/2023] Open
Abstract
This paper presents ConF, a novel deep learning model designed for accurate and efficient prediction of noncoding RNA families. NcRNAs are essential functional RNA molecules involved in various cellular processes, including replication, transcription, and gene expression. Identifying ncRNA families is crucial for comprehensive RNA research, as ncRNAs within the same family often exhibit similar functionalities. Traditional experimental methods for identifying ncRNA families are time-consuming and labor-intensive. Computational approaches relying on annotated secondary structure data face limitations in handling complex structures like pseudoknots and have restricted applicability, resulting in suboptimal prediction performance. To overcome these challenges, ConF integrates mainstream techniques such as residual networks with dilated convolutions and cross multi-head attention mechanisms. By employing a combination of dual-layer convolutional networks and BiLSTM, ConF effectively captures intricate features embedded within RNA sequences. This feature extraction process leads to significantly improved prediction accuracy compared to existing methods. Experimental evaluations conducted using a single, publicly available dataset and applying ten-fold cross-validation demonstrate the superiority of ConF in terms of accuracy, sensitivity, and other performance metrics. Overall, ConF represents a promising solution for accurate and efficient ncRNA family prediction, addressing the limitations of traditional experimental and computational methods.
Collapse
Affiliation(s)
- Shoryu Teragawa
- School of Software, Dalian University of Technology, Dalian 116024, China;
| | | |
Collapse
|
4
|
Chen K, Zhu X, Wang J, Zhao Z, Hao L, Guo X, Liu Y. MFPred: prediction of ncRNA families based on multi-feature fusion. Brief Bioinform 2023; 24:bbad303. [PMID: 37615358 DOI: 10.1093/bib/bbad303] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Revised: 07/30/2023] [Accepted: 07/31/2023] [Indexed: 08/25/2023] Open
Abstract
Non-coding RNA (ncRNA) plays a critical role in biology. ncRNAs from the same family usually have similar functions, as a result, it is essential to predict ncRNA families before identifying their functions. There are two primary methods for predicting ncRNA families, namely, traditional biological methods and computational methods. In traditional biological methods, a lot of manpower and resources are required to predict ncRNA families. Therefore, this paper proposed a new ncRNA family prediction method called MFPred based on computational methods. MFPred identified ncRNA families by extracting sequence features of ncRNAs, and it possessed three primary modules, including (1) four ncRNA sequences encoding and feature extraction module, which encoded ncRNA sequences and extracted four different features of ncRNA sequences, (2) dynamic Bi_GRU and feature fusion module, which extracted contextual information features of the ncRNA sequence and (3) ResNet_SE module that extracted local information features of the ncRNA sequence. In this study, MFPred was compared with the previously proposed ncRNA family prediction methods using two frequently used public ncRNA datasets, NCY and nRC. The results showed that MFPred outperformed other prediction methods in the two datasets.
Collapse
Affiliation(s)
- Kai Chen
- College of Software, jilin University, Changchun, 130012, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, jilin University, Changchun, 130012, China
| | - Xiaodong Zhu
- College of Software, jilin University, Changchun, 130012, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, jilin University, Changchun, 130012, China
- College of Computer Science and Technology, jilin University, Changchun, 130012, China
| | - Jiahao Wang
- College of Software, jilin University, Changchun, 130012, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, jilin University, Changchun, 130012, China
| | - Ziqi Zhao
- College of Software, jilin University, Changchun, 130012, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, jilin University, Changchun, 130012, China
| | - Lei Hao
- College of Software, jilin University, Changchun, 130012, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, jilin University, Changchun, 130012, China
| | - Xinsheng Guo
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, jilin University, Changchun, 130012, China
- College of Computer Science and Technology, jilin University, Changchun, 130012, China
| | - Yuanning Liu
- College of Software, jilin University, Changchun, 130012, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, jilin University, Changchun, 130012, China
- College of Computer Science and Technology, jilin University, Changchun, 130012, China
| |
Collapse
|
5
|
Sutanto K, Turcotte M. Assessing Global-Local Secondary Structure Fingerprints to Classify RNA Sequences With Deep Learning. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:2736-2747. [PMID: 34633933 DOI: 10.1109/tcbb.2021.3118358] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
RNA elements that are transcribed but not translated into proteins are called non-coding RNAs (ncRNAs). They play wide-ranging roles in biological processes and disorders. Just like proteins, their structure is often intimately linked to their function. Many examples have been documented where structure is conserved across taxa despite sequence divergence. Thus, structure is often used to identify function. Specifically, the secondary structure is predicted and ncRNAs with similar structures are assumed to have same or similar functions. However, a strand of RNA can fold into multiple possible structures, and some strands even fold differently in vivo and in vitro. Furthermore, ncRNAs often function as RNA-protein complexes, which can affect structure. Because of these, we hypothesized using one structure per sequence may discard information, possibly resulting in poorer classification accuracy. Therefore, we propose using secondary structure fingerprints, comprising two categories: a higher-level representation derived from RNA-As-Graphs (RAG), and free energy fingerprints based on a curated repertoire of small structural motifs. The fingerprints take into account the difference between global and local structural matches. We also evaluated our deep learning architecture with k-mers. By combining our global-local fingerprints with 6-mer, we achieved an accuracy, precision, and recall of 91.04%, 91.10%, and 91.00%.
Collapse
|
6
|
Dupont MJ, Major F. D-ORB: A Web Server to Extract Structural Features of Related But Unaligned RNA Sequences. J Mol Biol 2023; 435:168181. [PMID: 37468182 DOI: 10.1016/j.jmb.2023.168181] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2022] [Revised: 06/02/2023] [Accepted: 06/06/2023] [Indexed: 07/21/2023]
Abstract
Identifying the common structural elements of functionally related RNA sequences (family) is usually based on an alignment of the sequences, which is often subject to human bias and may not be accurate. The resulting covariance model (CM) provides probabilities for each base to covary with another, which allows to support evolutionarily the formation of double helical regions and possibly pseudoknots. The coexistence of alternative folds in RNA, resulting from its dynamic nature, may lead to the potential omission of motifs by CM. To overcome this limitation, we present D-ORB, a system of algorithms that identifies overrepresented motifs in the secondary conformational landscapes of a family when compared to those of unrelated sequences. The algorithms are bundled into an easy-to-use website allowing users to submit a family, and optionally provide unrelated sequences. D-ORB produces a non-pseudoknotted secondary structure based on the overrepresented motifs, a deep neural network classifier and two decision trees. When used to model an Rfam family, D-ORB fits overrepresented motifs in the corresponding Rfam structure; more than a hundred Rfam families have been modeled. The statistical approach behind D-ORB derives the structural composition of an RNA family, making it a valuable tool for analyzing and modeling it. Its easy-to-use interface and advanced algorithms make it an essential resource for researchers studying RNA structure. D-ORB is available at https://d-orb.major.iric.ca/.
Collapse
Affiliation(s)
- Mathieu J Dupont
- Department of Computer Science and Operations Research, and the Institute for Research in Immunology and Cancer, Université de Montréal, Montreal, Quebec H3C 3J7, Canada
| | - François Major
- Department of Computer Science and Operations Research, and the Institute for Research in Immunology and Cancer, Université de Montréal, Montreal, Quebec H3C 3J7, Canada. https://twitter.com/francois_major
| |
Collapse
|
7
|
Orro A, Trombetti GA. High-Accuracy ncRNA Function Prediction via Deep Learning Using Global and Local Sequence Information. Biomedicines 2023; 11:1631. [PMID: 37371726 DOI: 10.3390/biomedicines11061631] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Revised: 06/01/2023] [Accepted: 06/02/2023] [Indexed: 06/29/2023] Open
Abstract
The prediction of the biological function of non-coding ribonucleic acid (ncRNA) is an important step towards understanding the regulatory mechanisms underlying many diseases. Since non-coding RNAs are present in great abundance in human cells and are functionally diverse, developing functional prediction tools is necessary. With recent advances in non-coding RNA biology and the availability of complete genome sequences for a large number of species, we now have a window of opportunity for studying non-coding RNA biology. However, the computational methods used to predict the non-coding RNA functions are mostly either scarcely accurate, when based on sequence information alone, or prohibitively expensive in terms of computational burden when a secondary structure prediction is needed. We propose a novel computational method to predict the biological function of non-coding RNA genes that is based on a collection of deep network architectures utilizing solely ncRNA sequence information and which does not rely on or require expensive secondary ncRNA structure information. The approach presented in this work exhibits comparable or superior accuracy to methods that employ both sequence and structural features, at a much lower computational cost.
Collapse
Affiliation(s)
- Alessandro Orro
- Institute for Biomedical Technologies, National Research Council (ITB-CNR), 20054 Segrate, Italy
| | - Gabriele A Trombetti
- Institute for Biomedical Technologies, National Research Council (ITB-CNR), 20054 Segrate, Italy
| |
Collapse
|
8
|
Dunkel H, Wehrmann H, Jensen LR, Kuss AW, Simm S. MncR: Late Integration Machine Learning Model for Classification of ncRNA Classes Using Sequence and Structural Encoding. Int J Mol Sci 2023; 24:8884. [PMID: 37240230 PMCID: PMC10218863 DOI: 10.3390/ijms24108884] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Revised: 05/11/2023] [Accepted: 05/13/2023] [Indexed: 05/28/2023] Open
Abstract
Non-coding RNA (ncRNA) classes take over important housekeeping and regulatory functions and are quite heterogeneous in terms of length, sequence conservation and secondary structure. High-throughput sequencing reveals that the expressed novel ncRNAs and their classification are important to understand cell regulation and identify potential diagnostic and therapeutic biomarkers. To improve the classification of ncRNAs, we investigated different approaches of utilizing primary sequences and secondary structures as well as the late integration of both using machine learning models, including different neural network architectures. As input, we used the newest version of RNAcentral, focusing on six ncRNA classes, including lncRNA, rRNA, tRNA, miRNA, snRNA and snoRNA. The late integration of graph-encoded structural features and primary sequences in our MncR classifier achieved an overall accuracy of >97%, which could not be increased by more fine-grained subclassification. In comparison to the actual best-performing tool ncRDense, we had a minimal increase of 0.5% in all four overlapping ncRNA classes on a similar test set of sequences. In summary, MncR is not only more accurate than current ncRNA prediction tools but also allows the prediction of long ncRNA classes (lncRNAs, certain rRNAs) up to 12.000 nts and is trained on a more diverse ncRNA dataset retrieved from RNAcentral.
Collapse
Affiliation(s)
- Heiko Dunkel
- Institute of Bioinformatics, University Medicine Greifswald, Walther-Rathenau Str. 48, 17489 Greifswald, Germany
| | - Henning Wehrmann
- Department of Biosciences, Molecular Cell Biology of Plants, Goethe University, 60438 Frankfurt am Main, Germany
| | - Lars R. Jensen
- Human Molecular Genetics Group, Department of Functional Genomics, Interfaculty Institute of Genetics and Functional Genomics, University Medicine Greifswald, 17475 Greifswald, Germany
| | - Andreas W. Kuss
- Human Molecular Genetics Group, Department of Functional Genomics, Interfaculty Institute of Genetics and Functional Genomics, University Medicine Greifswald, 17475 Greifswald, Germany
| | - Stefan Simm
- Institute of Bioinformatics, University Medicine Greifswald, Walther-Rathenau Str. 48, 17489 Greifswald, Germany
| |
Collapse
|
9
|
Abbas MN, Kausar S, Gul I, Li J, Yu H, Dong M, Cui H. The Potential Biological Roles of Circular RNAs in the Immune Systems of Insects to Pathogen Invasion. Genes (Basel) 2023; 14:genes14040895. [PMID: 37107653 PMCID: PMC10137924 DOI: 10.3390/genes14040895] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Revised: 04/08/2023] [Accepted: 04/10/2023] [Indexed: 04/29/2023] Open
Abstract
Circular RNAs (circRNAs) are a newly discovered class of endogenously expressed non-coding RNAs (ncRNAs). They are highly stable, covalently closed molecules that frequently exhibit tissue-specific expression in eukaryotes. A small number of circRNAs are abundant and have been remarkably conserved throughout evolution. Numerous circRNAs are known to play important biological roles by acting as microRNAs (miRNAs) or protein inhibitors ('sponges'), by regulating the function of proteins, or by being translated themselves. CircRNAs have distinct cellular functions due to structural and production differences from mRNAs. Recent advances highlight the importance of characterizing circRNAs and their targets in a variety of insect species in order to fully understand how they contribute to the immune responses of these insects. Here, we focus on the recent advances in our understanding of the biogenesis of circRNAs, regulation of their abundance, and biological roles, such as serving as templates for translation and in the regulation of signaling pathways. We also discuss the emerging roles of circRNAs in regulating immune responses to various microbial pathogens. Furthermore, we describe the functions of circRNAs encoded by microbial pathogens that play in their hosts.
Collapse
Affiliation(s)
- Muhammad Nadeem Abbas
- State Key Laboratory of Resource Insects, Southwest University, Chongqing 400716, China
- Cancer Center, Medical Research Institute, Southwest University, Chongqing 400716, China
| | - Saima Kausar
- State Key Laboratory of Resource Insects, Southwest University, Chongqing 400716, China
- Cancer Center, Medical Research Institute, Southwest University, Chongqing 400716, China
| | - Isma Gul
- State Key Laboratory of Resource Insects, Southwest University, Chongqing 400716, China
- Cancer Center, Medical Research Institute, Southwest University, Chongqing 400716, China
| | - Jisheng Li
- State Key Laboratory of Resource Insects, Southwest University, Chongqing 400716, China
- Cancer Center, Medical Research Institute, Southwest University, Chongqing 400716, China
| | - Huijuan Yu
- State Key Laboratory of Resource Insects, Southwest University, Chongqing 400716, China
- Cancer Center, Medical Research Institute, Southwest University, Chongqing 400716, China
| | - Mengyao Dong
- State Key Laboratory of Resource Insects, Southwest University, Chongqing 400716, China
- Cancer Center, Medical Research Institute, Southwest University, Chongqing 400716, China
| | - Hongjuan Cui
- State Key Laboratory of Resource Insects, Southwest University, Chongqing 400716, China
- Cancer Center, Medical Research Institute, Southwest University, Chongqing 400716, China
- Jinfeng Laboratory, Chongqing 401329, China
| |
Collapse
|
10
|
Lima DDS, Amichi LJA, Fernandez MA, Constantino AA, Seixas FAV. NCYPred: A Bidirectional LSTM Network With Attention for Y RNA and Short Non-Coding RNA Classification. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:557-565. [PMID: 34826297 DOI: 10.1109/tcbb.2021.3131136] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Short non-coding RNAs (sncRNAs) are involved in multiple cellular processes and can be divided into dozens of classes. Among such classes, Y RNAs have been gaining attention, being essential factors for the initiation of DNA replication on vertebrates, as well as potential tumor biomarkers. Homologs have also been described in nematodes and insects, as well as related sequences in bacteria. Methods capable of accurately predicting Y RNA transcripts are lacking. In this work, we developed an attention-based LSTM network and built a classification model able to classify sncRNAs (including Y RNA) directly from nucleotide sequences. A dataset consisting of 45,447 sncRNA sequences, from a wide range of organisms, obtained from Rfam 14.3 was built. Performance evaluation demonstrated that our proposed method, NCYPred (Non-Coding/Y RNA Prediction), can accurately predict Y RNA sequences and their homologs, as well as 11 additional classes, achieving results comparable with state-of-the-art methods. We also demonstrate that applying t-SNE on learned sequence representations could be useful for sequence analysis. Our model is freely available as a web-server (https://www.gpea.uem.br/ncypred/).
Collapse
|
11
|
Montaha S, Azam S, Rafid AKMRH, Hasan MZ, Karim A, Hasib KM, Patel SK, Jonkman M, Mannan ZI. MNet-10: A robust shallow convolutional neural network model performing ablation study on medical images assessing the effectiveness of applying optimal data augmentation technique. Front Med (Lausanne) 2022; 9:924979. [PMID: 36052321 PMCID: PMC9424498 DOI: 10.3389/fmed.2022.924979] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Accepted: 07/19/2022] [Indexed: 11/13/2022] Open
Abstract
Interpretation of medical images with a computer-aided diagnosis (CAD) system is arduous because of the complex structure of cancerous lesions in different imaging modalities, high degree of resemblance between inter-classes, presence of dissimilar characteristics in intra-classes, scarcity of medical data, and presence of artifacts and noises. In this study, these challenges are addressed by developing a shallow convolutional neural network (CNN) model with optimal configuration performing ablation study by altering layer structure and hyper-parameters and utilizing a suitable augmentation technique. Eight medical datasets with different modalities are investigated where the proposed model, named MNet-10, with low computational complexity is able to yield optimal performance across all datasets. The impact of photometric and geometric augmentation techniques on different datasets is also evaluated. We selected the mammogram dataset to proceed with the ablation study for being one of the most challenging imaging modalities. Before generating the model, the dataset is augmented using the two approaches. A base CNN model is constructed first and applied to both the augmented and non-augmented mammogram datasets where the highest accuracy is obtained with the photometric dataset. Therefore, the architecture and hyper-parameters of the model are determined by performing an ablation study on the base model using the mammogram photometric dataset. Afterward, the robustness of the network and the impact of different augmentation techniques are assessed by training the model with the rest of the seven datasets. We obtain a test accuracy of 97.34% on the mammogram, 98.43% on the skin cancer, 99.54% on the brain tumor magnetic resonance imaging (MRI), 97.29% on the COVID chest X-ray, 96.31% on the tympanic membrane, 99.82% on the chest computed tomography (CT) scan, and 98.75% on the breast cancer ultrasound datasets by photometric augmentation and 96.76% on the breast cancer microscopic biopsy dataset by geometric augmentation. Moreover, some elastic deformation augmentation methods are explored with the proposed model using all the datasets to evaluate their effectiveness. Finally, VGG16, InceptionV3, and ResNet50 were trained on the best-performing augmented datasets, and their performance consistency was compared with that of the MNet-10 model. The findings may aid future researchers in medical data analysis involving ablation studies and augmentation techniques.
Collapse
Affiliation(s)
- Sidratul Montaha
- Department of Computer Science and Engineering, Daffodil International University, Dhaka, Bangladesh
| | - Sami Azam
- College of Engineering, IT & Environment, Charles Darwin University, Darwin, NT, Australia
| | | | - Md. Zahid Hasan
- Department of Computer Science and Engineering, Daffodil International University, Dhaka, Bangladesh
| | - Asif Karim
- College of Engineering, IT & Environment, Charles Darwin University, Darwin, NT, Australia
| | - Khan Md. Hasib
- Department of Computer Science and Engineering, Ahsanullah University of Science and Technology, Dhaka, Bangladesh
| | - Shobhit K. Patel
- Department of Computer Engineering, Marwadi University, Rajkot, India
| | - Mirjam Jonkman
- College of Engineering, IT & Environment, Charles Darwin University, Darwin, NT, Australia
| | - Zubaer Ibna Mannan
- Department of Smart Computing, Kyungdong University – Global Campus, Sokcho-si, South Korea
| |
Collapse
|
12
|
Gonçalves KB, Appel RJC, Bôas LAV, Cardoso PF, Bôas GTV. Genomic insights into the diversity of non-coding RNAs in Bacillus cereus sensu lato. Curr Genet 2022; 68:449-466. [PMID: 35552506 DOI: 10.1007/s00294-022-01240-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2021] [Revised: 03/20/2022] [Accepted: 03/30/2022] [Indexed: 11/28/2022]
Abstract
Bacillus cereus sensu lato is a group of bacteria of medical and agricultural importance in different ecological niches and with controversial taxonomic relationships. Studying the composition of non-coding RNAs (ncRNAs) in several bacterial groups has been an important tool for identifying genetic information and better understanding genetic regulation towards environment adaptation. However, to date, no comparative genomics study of ncRNA has been performed in this group. Thus, this study aimed to identify and characterize the set of ncRNAs from 132 strains of Bacillus cereus, Bacillus thuringiensis and Bacillus anthracis to obtain an overview of the diversity and distribution of these genetic elements in these species. We observed that the number of ncRNAs differs in the chromosomes of the three species, but not in the plasmids, when species or phylogenetic clusters were compared. The prevailing functional/structural category was Cis-reg and the most frequent class was Riboswitch. However, in plasmids, the class Group II intron was the most frequent. Also, nine ncRNAs were selected for validation in the strain B. thuringiensis 407 by RT-PCR, which allowed to identify the expression of the ncRNAs. The wide distribution and diversity of ncRNAs in the B. cereus group, and more intensely in B. thuringiensis, may help improve the abilities of these species to adapt to various environmental changes. Further studies should address the expression of these genetic elements in different conditions.
Collapse
Affiliation(s)
- Kátia B Gonçalves
- Depto Biologia Geral, Universidade Estadual de Londrina, Londrina, Brazil
| | | | | | | | | |
Collapse
|
13
|
A TRIzol-based method for high recovery of plasma sncRNAs approximately 30 to 60 nucleotides. Sci Rep 2022; 12:6778. [PMID: 35474236 PMCID: PMC9042852 DOI: 10.1038/s41598-022-10800-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2021] [Accepted: 04/13/2022] [Indexed: 11/08/2022] Open
Abstract
Protein functional effector sncRNAs (pfeRNAs) are approximately 30–60 nucleotides (nt), of which the extraction method from plasma has not yet been reported. Silver staining in a high-resolution polyacrylamide gel suggested that the majority of plasma sncRNAs extracted by some broadly used commercial kits were sncRNAs from 100 nt upwards. Additionally, TRIzol’s protocol is for long RNA but not sncRNA recovery. Here, we report a TRIzol-based frozen precipitation method (TFP method), which shows rigor and reproducibility in high yield and quality for plasma sncRNAs approximately 30–60 nt. In contrast to the yields by the commercial kit, plasma sncRNAs extracted by the TFP method enriched more sncRNAs. We used four different pfeRNAs of 34 nt, 45 nt, 53 nt, and 58 nt to represent typical sizes of sncRNAs from 30 to 60 nt and compared their levels in the recovered sncRNAs by the TFP method and by the commercial kit. The TFP method showed lower cycle threshold (CT) values by 2.01–9.17 cycles in 38 plasma samples from 38 patients, including Caucasian, Asian, African American, Latin, Mexican, and those who were a mix of more than one race. In addition, pfeRNAs extracted by two organic-based extraction methods and four commercial kits were undetermined in 22 of 38 samples. Thus, the quick and unbiased TFP method enriches plasma sncRNA ranging from 30 to 60 nt.
Collapse
|
14
|
Xu D, Yuan W, Fan C, Liu B, Lu MZ, Zhang J. Opportunities and Challenges of Predictive Approaches for the Non-coding RNA in Plants. FRONTIERS IN PLANT SCIENCE 2022; 13:890663. [PMID: 35498708 PMCID: PMC9048598 DOI: 10.3389/fpls.2022.890663] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/06/2022] [Accepted: 03/28/2022] [Indexed: 06/01/2023]
Affiliation(s)
- Dong Xu
- State Key Laboratory of Subtropical Silviculture, College of Forestry and Biotechnology, Zhejiang A&F University, Hangzhou, China
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Wenya Yuan
- State Key Laboratory of Subtropical Silviculture, College of Forestry and Biotechnology, Zhejiang A&F University, Hangzhou, China
| | - Chunjie Fan
- State Key Laboratory of Tree Genetics and Breeding, Research Institute of Tropical Forestry, Chinese Academy of Forestry, Guangzhou, China
| | - Bobin Liu
- Jiangsu Key Laboratory for Bioresources of Saline Soils, Jiangsu Synthetic Innovation Center for Coastal Bio-agriculture, School of Wetlands, Yancheng Teachers University, Yancheng, China
| | - Meng-Zhu Lu
- State Key Laboratory of Subtropical Silviculture, College of Forestry and Biotechnology, Zhejiang A&F University, Hangzhou, China
| | - Jin Zhang
- State Key Laboratory of Subtropical Silviculture, College of Forestry and Biotechnology, Zhejiang A&F University, Hangzhou, China
| |
Collapse
|
15
|
Nithin C, Mukherjee S, Basak J, Bahadur RP. NCodR: A multi-class support vector machine classification to distinguish non-coding RNAs in Viridiplantae. QUANTITATIVE PLANT BIOLOGY 2022; 3:e23. [PMID: 37077974 PMCID: PMC10095871 DOI: 10.1017/qpb.2022.18] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/25/2021] [Revised: 08/22/2022] [Accepted: 08/24/2022] [Indexed: 05/02/2023]
Abstract
Non-coding RNAs (ncRNAs) are major players in the regulation of gene expression. This study analyses seven classes of ncRNAs in plants using sequence and secondary structure-based RNA folding measures. We observe distinct regions in the distribution of AU content along with overlapping regions for different ncRNA classes. Additionally, we find similar averages for minimum folding energy index across various ncRNAs classes except for pre-miRNAs and lncRNAs. Various RNA folding measures show similar trends among the different ncRNA classes except for pre-miRNAs and lncRNAs. We observe different k-mer repeat signatures of length three among various ncRNA classes. However, in pre-miRs and lncRNAs, a diffuse pattern of k-mers is observed. Using these attributes, we train eight different classifiers to discriminate various ncRNA classes in plants. Support vector machines employing radial basis function show the highest accuracy (average F1 of ~96%) in discriminating ncRNAs, and the classifier is implemented as a web server, NCodR.
Collapse
Affiliation(s)
- Chandran Nithin
- Computational Structural Biology Lab, Department of Biotechnology, Indian Institute of Technology, Kharagpur721302, India
- Laboratory of Computational Biology, Faculty of Chemistry, Biological and Chemical Research Centre, University of Warsaw, 02-089Warsaw, Poland
| | - Sunandan Mukherjee
- Computational Structural Biology Lab, Department of Biotechnology, Indian Institute of Technology, Kharagpur721302, India
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology, PL-02-109Warsaw, Poland
| | - Jolly Basak
- Department of Biotechnology, Visva-Bharati, Santiniketan, 731235, India
| | - Ranjit Prasad Bahadur
- Computational Structural Biology Lab, Department of Biotechnology, Indian Institute of Technology, Kharagpur721302, India
- Author for correspondence: R. P. Bahadur, E-mail:
| |
Collapse
|
16
|
Wang L, Zhong X, Wang S, Liu Y. ncDLRES: a novel method for non-coding RNAs family prediction based on dynamic LSTM and ResNet. BMC Bioinformatics 2021; 22:447. [PMID: 34544356 PMCID: PMC8451086 DOI: 10.1186/s12859-021-04365-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2021] [Accepted: 09/01/2021] [Indexed: 12/20/2022] Open
Abstract
Background Studies have proven that the same family of non-coding RNAs (ncRNAs) have similar functions, so predicting the ncRNAs family is helpful to the research of ncRNAs functions. The existing calculation methods mainly fall into two categories: the first type is to predict ncRNAs family by learning the features of sequence or secondary structure, and the other type is to predict ncRNAs family by the alignment among homologs sequences. In the first type, some methods predict ncRNAs family by learning predicted secondary structure features. The inaccuracy of predicted secondary structure may cause the low accuracy of those methods. Different from that, ncRFP directly learning the features of ncRNA sequences to predict ncRNAs family. Although ncRFP simplifies the prediction process and improves the performance, there is room for improvement in ncRFP performance due to the incomplete features of its input data. In the secondary type, the homologous sequence alignment method can achieve the highest performance at present. However, due to the need for consensus secondary structure annotation of ncRNA sequences, and the helplessness for modeling pseudoknots, the use of the method is limited. Results In this paper, a novel method “ncDLRES”, which according to learning the sequence features, is proposed to predict the family of ncRNAs based on Dynamic LSTM (Long Short-term Memory) and ResNet (Residual Neural Network). Conclusions ncDLRES extracts the features of ncRNA sequences based on Dynamic LSTM and then classifies them by ResNet. Compared with the homologous sequence alignment method, ncDLRES reduces the data requirement and expands the application scope. By comparing with the first type of methods, the performance of ncDLRES is greatly improved.
Collapse
Affiliation(s)
- Linyu Wang
- College of Computer Science and Technology, Jilin University, Changchun, China
| | - Xiaodan Zhong
- College of Computer Science and Technology, Jilin University, Changchun, China
| | - Shuo Wang
- College of Computer Science and Technology, Jilin University, Changchun, China
| | - Yuanning Liu
- College of Computer Science and Technology, Jilin University, Changchun, China.
| |
Collapse
|
17
|
Asim MN, Ibrahim MA, Imran Malik M, Dengel A, Ahmed S. Advances in Computational Methodologies for Classification and Sub-Cellular Locality Prediction of Non-Coding RNAs. Int J Mol Sci 2021; 22:8719. [PMID: 34445436 PMCID: PMC8395733 DOI: 10.3390/ijms22168719] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2021] [Revised: 08/02/2021] [Accepted: 08/03/2021] [Indexed: 02/06/2023] Open
Abstract
Apart from protein-coding Ribonucleic acids (RNAs), there exists a variety of non-coding RNAs (ncRNAs) which regulate complex cellular and molecular processes. High-throughput sequencing technologies and bioinformatics approaches have largely promoted the exploration of ncRNAs which revealed their crucial roles in gene regulation, miRNA binding, protein interactions, and splicing. Furthermore, ncRNAs are involved in the development of complicated diseases like cancer. Categorization of ncRNAs is essential to understand the mechanisms of diseases and to develop effective treatments. Sub-cellular localization information of ncRNAs demystifies diverse functionalities of ncRNAs. To date, several computational methodologies have been proposed to precisely identify the class as well as sub-cellular localization patterns of RNAs). This paper discusses different types of ncRNAs, reviews computational approaches proposed in the last 10 years to distinguish coding-RNA from ncRNA, to identify sub-types of ncRNAs such as piwi-associated RNA, micro RNA, long ncRNA, and circular RNA, and to determine sub-cellular localization of distinct ncRNAs and RNAs. Furthermore, it summarizes diverse ncRNA classification and sub-cellular localization determination datasets along with benchmark performance to aid the development and evaluation of novel computational methodologies. It identifies research gaps, heterogeneity, and challenges in the development of computational approaches for RNA sequence analysis. We consider that our expert analysis will assist Artificial Intelligence researchers with knowing state-of-the-art performance, model selection for various tasks on one platform, dominantly used sequence descriptors, neural architectures, and interpreting inter-species and intra-species performance deviation.
Collapse
Affiliation(s)
- Muhammad Nabeel Asim
- German Research Center for Artificial Intelligence (DFKI), 67663 Kaiserslautern, Germany; (M.A.I.); (A.D.); (S.A.)
- Department of Computer Science, Technical University of Kaiserslautern, 67663 Kaiserslautern, Germany
| | - Muhammad Ali Ibrahim
- German Research Center for Artificial Intelligence (DFKI), 67663 Kaiserslautern, Germany; (M.A.I.); (A.D.); (S.A.)
- Department of Computer Science, Technical University of Kaiserslautern, 67663 Kaiserslautern, Germany
| | - Muhammad Imran Malik
- National Center for Artificial Intelligence (NCAI), National University of Sciences and Technology, Islamabad 44000, Pakistan;
- School of Electrical Engineering & Computer Science, National University of Sciences and Technology, Islamabad 44000, Pakistan
| | - Andreas Dengel
- German Research Center for Artificial Intelligence (DFKI), 67663 Kaiserslautern, Germany; (M.A.I.); (A.D.); (S.A.)
- Department of Computer Science, Technical University of Kaiserslautern, 67663 Kaiserslautern, Germany
| | - Sheraz Ahmed
- German Research Center for Artificial Intelligence (DFKI), 67663 Kaiserslautern, Germany; (M.A.I.); (A.D.); (S.A.)
- DeepReader GmbH, Trippstadter Str. 122, 67663 Kaiserslautern, Germany
| |
Collapse
|
18
|
Jiang JY, Ju CJT, Hao J, Chen M, Wang W. JEDI: circular RNA prediction based on junction encoders and deep interaction among splice sites. Bioinformatics 2021; 37:i289-i298. [PMID: 34252942 PMCID: PMC8336595 DOI: 10.1093/bioinformatics/btab288] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
Abstract
Motivation Circular RNA (circRNA) is a novel class of long non-coding RNAs that have been broadly discovered in the eukaryotic transcriptome. The circular structure arises from a non-canonical splicing process, where the donor site backspliced to an upstream acceptor site. These circRNA sequences are conserved across species. More importantly, rising evidence suggests their vital roles in gene regulation and association with diseases. As the fundamental effort toward elucidating their functions and mechanisms, several computational methods have been proposed to predict the circular structure from the primary sequence. Recently, advanced computational methods leverage deep learning to capture the relevant patterns from RNA sequences and model their interactions to facilitate the prediction. However, these methods fail to fully explore positional information of splice junctions and their deep interaction. Results We present a robust end-to-end framework, Junction Encoder with Deep Interaction (JEDI), for circRNA prediction using only nucleotide sequences. JEDI first leverages the attention mechanism to encode each junction site based on deep bidirectional recurrent neural networks and then presents the novel cross-attention layer to model deep interaction among these sites for backsplicing. Finally, JEDI can not only predict circRNAs but also interpret relationships among splice sites to discover backsplicing hotspots within a gene region. Experiments demonstrate JEDI significantly outperforms state-of-the-art approaches in circRNA prediction on both isoform level and gene level. Moreover, JEDI also shows promising results on zero-shot backsplicing discovery, where none of the existing approaches can achieve. Availability and implementation The implementation of our framework is available at https://github.com/hallogameboy/JEDI. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jyun-Yu Jiang
- Department of Computer Science, University of California, Los Angeles, CA 90024, USA
| | - Chelsea J-T Ju
- Department of Computer Science, University of California, Los Angeles, CA 90024, USA
| | - Junheng Hao
- Department of Computer Science, University of California, Los Angeles, CA 90024, USA
| | - Muhao Chen
- Department of Computer Science, University of Southern California, Los Angeles, CA 90007, USA
| | - Wei Wang
- Department of Computer Science, University of California, Los Angeles, CA 90024, USA
| |
Collapse
|
19
|
Chantsalnyam T, Siraj A, Tayara H, Chong KT. ncRDense: A novel computational approach for classification of non-coding RNA family by deep learning. Genomics 2021; 113:3030-3038. [PMID: 34242708 DOI: 10.1016/j.ygeno.2021.07.004] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2020] [Revised: 06/29/2021] [Accepted: 07/03/2021] [Indexed: 12/14/2022]
Abstract
With the rapidly growing importance of biological research, non-coding RNAs (ncRNA) attract more attention in biology and bioinformatics. They play vital roles in biological processes such as transcription and translation. Classification of ncRNAs is essential to our understanding of disease mechanisms and treatment design. Many approaches to ncRNA classification have been developed, several of which use machine learning and deep learning. In this paper, we construct a novel deep learning-based architecture, ncRDense, to effectively classify and distinguish ncRNA families. In a comparative study, our model produces comparable results with existing state-of-the-art methods. Finally, we built a freely accessible web server for the ncRDense tool, which is available at http://nsclbio.jbnu.ac.kr/tools/ncRDense/.
Collapse
Affiliation(s)
- Tuvshinbayar Chantsalnyam
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, South Korea
| | - Arslan Siraj
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, South Korea
| | - Hilal Tayara
- School of International Engineering and Science, Jeonbuk National University, Jeonju 54896, South Korea.
| | - Kil To Chong
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, South Korea; Advanced Electronics and Information Research Center, Jeonbuk National University, Jeonju 54896, South Korea.
| |
Collapse
|
20
|
Singh D, Madhawan A, Roy J. Identification of multiple RNAs using feature fusion. Brief Bioinform 2021; 22:6272794. [PMID: 33971667 DOI: 10.1093/bib/bbab178] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2021] [Revised: 04/08/2021] [Indexed: 11/13/2022] Open
Abstract
Detection of novel transcripts with deep sequencing has increased the demand for computational algorithms as their identification and validation using in vivo techniques is time-consuming, costly and unreliable. Most of these discovered transcripts belong to non-coding RNAs, a large group known for their diverse functional roles but lacks the common taxonomy. Thus, upon the identification of the absence of coding potential in them, it is crucial to recognize their prime functional category. To address this heterogeneity issue, we divide the ncRNAs into three classes and present RNA classifier (RNAC) that categorizes the RNAs into coding, housekeeping, small non-coding and long non-coding classes. RNAC utilizes the alignment-based genomic descriptors to extract statistical, local binary patterns and histogram features and fuse them to construct the classification models with extreme gradient boosting. The experiments are performed on four species, and the performance is assessed on multiclass and conventional binary classification (coding versus no-coding) problems. The proposed approach achieved >93% accuracy on both classification problems and also outperformed other well-known existing methods in coding potential prediction. This validates the usefulness of feature fusion for improved performance on both types of classification problems. Hence, RNAC is a valuable tool for the accurate identification of multiple RNAs .
Collapse
Affiliation(s)
- Dalwinder Singh
- National Agri-Food Biotechnology Institute, Sector 81, SAS Nagar, 140306, Punjab, India
| | - Akansha Madhawan
- National Agri-Food Biotechnology Institute, Sector 81, SAS Nagar, 140306, Punjab, India
| | - Joy Roy
- National Agri-Food Biotechnology Institute, Sector 81, SAS Nagar, 140306, Punjab, India
| |
Collapse
|
21
|
Wang L, Zheng S, Zhang H, Qiu Z, Zhong X, Liuliu H, Liu Y. ncRFP: A Novel end-to-end Method for Non-Coding RNAs Family Prediction Based on Deep Learning. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:784-789. [PMID: 32224462 DOI: 10.1109/tcbb.2020.2982873] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Evidence has accumulated enough to prove non-coding RNAs (ncRNAs) play important roles in cellular biological processes and disease pathogenesis. High throughput techniques have produced a large number of ncRNAs whose function remains unknown. Since the accurate identification of ncRNAs family is helpful to the research of their function, it is of necessity and urgency to predict the family of each ncRNAs. Although several traditional excellent methods are applicable to predict the family of ncRNAs, their complex procedures or inaccurate performance remain major problems confronting us. The main idea of those methods is first to predict the secondary structure, and then identify ncRNAs family according to properties of the secondary structure. Unfortunately, the multi-step error superposition, especially the imperfection of RNA secondary structure prediction tools, maybe the cause of low accuracy. In this paper, a novel end-to-end method 'ncRFP' was proposed to complete the prediction task based on Deep Learning. Instead of predicting the secondary structure, ncRFP predicts the ncRNAs family by automatically extracting features from ncRNAs sequences. Compared with other methods, ncRFP not only simplifies the process but also improves accuracy. The source code of ncRFP can be available at https://github.com/linyuwangPHD/ncRFP.
Collapse
|
22
|
CircNet: an encoder–decoder-based convolution neural network (CNN) for circular RNA identification. Neural Comput Appl 2021. [DOI: 10.1007/s00521-020-05673-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
|
23
|
Ahmad P, Bensaoud C, Mekki I, Rehman MU, Kotsyfakis M. Long Non-Coding RNAs and Their Potential Roles in the Vector-Host-Pathogen Triad. Life (Basel) 2021; 11:life11010056. [PMID: 33466803 PMCID: PMC7830631 DOI: 10.3390/life11010056] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2020] [Revised: 01/09/2021] [Accepted: 01/12/2021] [Indexed: 12/12/2022] Open
Abstract
Long non-coding (lnc)RNAs have emerged as critical regulators of gene expression and are involved in almost every cellular process. They can bind to other molecules including DNA, proteins, or even other RNA types such messenger RNA or small RNAs. LncRNAs are typically expressed at much lower levels than mRNA, and their expression is often restricted to tissue- or time-specific developmental stages. They are also involved in several inter-species interactions, including vector–host–pathogen interactions, where they can be either vector/host-derived or encoded by pathogens. In these interactions, they function via multiple mechanisms including regulating pathogen growth and replication or via cell-autonomous antimicrobial defense mechanisms. Recent advances suggest that characterizing lncRNAs and their targets in different species may hold the key to understanding the role of this class of non-coding RNA in interspecies crosstalk. In this review, we present a general overview of recent studies related to lncRNA-related regulation of gene expression as well as their possible involvement in regulating vector–host–pathogen interactions.
Collapse
Affiliation(s)
- Parwez Ahmad
- Institute of Parasitology, Biology Centre, Czech Academy of Sciences, 37005 Ceske Budejovice (Budweis), Czech Republic; (P.A.); (C.B.); (I.M.)
| | - Chaima Bensaoud
- Institute of Parasitology, Biology Centre, Czech Academy of Sciences, 37005 Ceske Budejovice (Budweis), Czech Republic; (P.A.); (C.B.); (I.M.)
| | - Imen Mekki
- Institute of Parasitology, Biology Centre, Czech Academy of Sciences, 37005 Ceske Budejovice (Budweis), Czech Republic; (P.A.); (C.B.); (I.M.)
- Faculty of Science, University of South Bohemia, 37005 Ceske Budejovice, Czech Republic
| | - Mujeeb Ur Rehman
- Key Laboratory of Animal Disease and Human Health of Sichuan Province, Sichuan Agricultural University, Wenjiang, Chengdu 611130, China;
| | - Michail Kotsyfakis
- Institute of Parasitology, Biology Centre, Czech Academy of Sciences, 37005 Ceske Budejovice (Budweis), Czech Republic; (P.A.); (C.B.); (I.M.)
- Correspondence:
| |
Collapse
|
24
|
María Hernández-Domínguez E, Sofía Castillo-Ortega L, García-Esquivel Y, Mandujano-González V, Díaz-Godínez G, Álvarez-Cervantes J. Bioinformatics as a Tool for the Structural and Evolutionary Analysis of Proteins. Comput Biol Chem 2020. [DOI: 10.5772/intechopen.89594] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
This chapter deals with the topic of bioinformatics, computational, mathematics, and statistics tools applied to biology, essential for the analysis and characterization of biological molecules, in particular proteins, which play an important role in all cellular and evolutionary processes of the organisms. In recent decades, with the next generation sequencing technologies and bioinformatics, it has facilitated the collection and analysis of a large amount of genomic, transcriptomic, proteomic, and metabolomic data from different organisms that have allowed predictions on the regulation of expression, transcription, translation, structure, and mechanisms of action of proteins as well as homology, mutations, and evolutionary processes that generate structural and functional changes over time. Although the information in the databases is greater every day, all bioinformatics tools continue to be constantly modified to improve performance that leads to more accurate predictions regarding protein functionality, which is why bioinformatics research remains a great challenge.
Collapse
|
25
|
Noviello TMR, Ceccarelli F, Ceccarelli M, Cerulo L. Deep learning predicts short non-coding RNA functions from only raw sequence data. PLoS Comput Biol 2020; 16:e1008415. [PMID: 33175836 PMCID: PMC7682815 DOI: 10.1371/journal.pcbi.1008415] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2020] [Revised: 11/23/2020] [Accepted: 09/28/2020] [Indexed: 12/31/2022] Open
Abstract
Small non-coding RNAs (ncRNAs) are short non-coding sequences involved in gene regulation in many biological processes and diseases. The lack of a complete comprehension of their biological functionality, especially in a genome-wide scenario, has demanded new computational approaches to annotate their roles. It is widely known that secondary structure is determinant to know RNA function and machine learning based approaches have been successfully proven to predict RNA function from secondary structure information. Here we show that RNA function can be predicted with good accuracy from a lightweight representation of sequence information without the necessity of computing secondary structure features which is computationally expensive. This finding appears to go against the dogma of secondary structure being a key determinant of function in RNA. Compared to recent secondary structure based methods, the proposed solution is more robust to sequence boundary noise and reduces drastically the computational cost allowing for large data volume annotations. Scripts and datasets to reproduce the results of experiments proposed in this study are available at: https://github.com/bioinformatics-sannio/ncrna-deep.
Collapse
Affiliation(s)
- Teresa Maria Rosaria Noviello
- Department of Electrical Engineering and Information Technology, University of Naples “Federico II”, Napoli, Italy
- Biogem Scarl, Istituto di Ricerche Genetiche “Gaetano Salvatore”, Ariano Irpino, Italy
| | - Francesco Ceccarelli
- CaReBios srl, Ariano Irpino, Italy
- Computer Laboratory, University of Cambridge, Cambridge, UK
| | - Michele Ceccarelli
- Department of Electrical Engineering and Information Technology, University of Naples “Federico II”, Napoli, Italy
- CaReBios srl, Ariano Irpino, Italy
| | - Luigi Cerulo
- Biogem Scarl, Istituto di Ricerche Genetiche “Gaetano Salvatore”, Ariano Irpino, Italy
- Department of Science and Technology, University of Sannio, Benevento, Italy
| |
Collapse
|
26
|
Angenent-Mari NM, Garruss AS, Soenksen LR, Church G, Collins JJ. A deep learning approach to programmable RNA switches. Nat Commun 2020; 11:5057. [PMID: 33028812 PMCID: PMC7541447 DOI: 10.1038/s41467-020-18677-1] [Citation(s) in RCA: 59] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2019] [Accepted: 07/31/2020] [Indexed: 12/21/2022] Open
Abstract
Engineered RNA elements are programmable tools capable of detecting small molecules, proteins, and nucleic acids. Predicting the behavior of these synthetic biology components remains a challenge, a situation that could be addressed through enhanced pattern recognition from deep learning. Here, we investigate Deep Neural Networks (DNN) to predict toehold switch function as a canonical riboswitch model in synthetic biology. To facilitate DNN training, we synthesize and characterize in vivo a dataset of 91,534 toehold switches spanning 23 viral genomes and 906 human transcription factors. DNNs trained on nucleotide sequences outperform (R2 = 0.43-0.70) previous state-of-the-art thermodynamic and kinetic models (R2 = 0.04-0.15) and allow for human-understandable attention-visualizations (VIS4Map) to identify success and failure modes. This work shows that deep learning approaches can be used for functionality predictions and insight generation in RNA synthetic biology.
Collapse
Affiliation(s)
- Nicolaas M Angenent-Mari
- Department of Biological Engineering, Massachusetts Institute of Technology (MIT), Cambridge, MA, 02139, USA
- Institute for Medical Engineering and Science (IMES), MIT, Cambridge, MA, 02139, USA
- Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA, 02115, USA
| | - Alexander S Garruss
- Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA, 02115, USA
- Program in Bioinformatics and Integrative Genomics, Harvard University, Cambridge, MA, 02138, USA
- Department of Genetics, Harvard Medical School, Boston, MA, 02115, USA
| | - Luis R Soenksen
- Department of Biological Engineering, Massachusetts Institute of Technology (MIT), Cambridge, MA, 02139, USA
- Institute for Medical Engineering and Science (IMES), MIT, Cambridge, MA, 02139, USA
- Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA, 02115, USA
- Department of Mechanical Engineering, MIT, Cambridge, MA, 02139, USA
| | - George Church
- Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA, 02115, USA
- Department of Genetics, Harvard Medical School, Boston, MA, 02115, USA
- Harvard-MIT Program in Health Sciences and Technology, Cambridge, MA, 02139, USA
| | - James J Collins
- Department of Biological Engineering, Massachusetts Institute of Technology (MIT), Cambridge, MA, 02139, USA.
- Institute for Medical Engineering and Science (IMES), MIT, Cambridge, MA, 02139, USA.
- Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA, 02115, USA.
- Department of Mechanical Engineering, MIT, Cambridge, MA, 02139, USA.
- Harvard-MIT Program in Health Sciences and Technology, Cambridge, MA, 02139, USA.
| |
Collapse
|
27
|
Chantsalnyam T, Lim DY, Tayara H, Chong KT. ncRDeep: Non-coding RNA classification with convolutional neural network. Comput Biol Chem 2020; 88:107364. [DOI: 10.1016/j.compbiolchem.2020.107364] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2020] [Revised: 08/04/2020] [Accepted: 08/18/2020] [Indexed: 12/21/2022]
|
28
|
Fezai R, Abodayeh K, Mansouri M, Nounou H, Nounou M. Fault diagnosis of biological systems using improved machine learning technique. INT J MACH LEARN CYB 2020. [DOI: 10.1007/s13042-020-01184-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023]
|
29
|
Leite ML, Oliveira KBS, Cunha VA, Dias SC, da Cunha NB, Costa FF. Epigenetic Therapies in the Precision Medicine Era. ADVANCED THERAPEUTICS 2020. [DOI: 10.1002/adtp.201900184] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Affiliation(s)
- Michel Lopes Leite
- Genomic Sciences and Biotechnology Program UCB ‐ Brasilia, SgAN 916, Modulo B, Bloco C, 70790‐160 Brasília DF Brazil
| | | | - Victor Albuquerque Cunha
- Genomic Sciences and Biotechnology Program UCB ‐ Brasilia, SgAN 916, Modulo B, Bloco C, 70790‐160 Brasília DF Brazil
| | - Simoni Campos Dias
- Genomic Sciences and Biotechnology Program UCB ‐ Brasilia, SgAN 916, Modulo B, Bloco C, 70790‐160 Brasília DF Brazil
- Animal Biology DepartmentUniversidade de Brasília UnB, Campus Darcy Ribeiro. Brasilia DF 70910‐900 Brazil
| | - Nicolau Brito da Cunha
- Genomic Sciences and Biotechnology Program UCB ‐ Brasilia, SgAN 916, Modulo B, Bloco C, 70790‐160 Brasília DF Brazil
| | - Fabricio F. Costa
- Cancer Biology and Epigenomics ProgramAnn & Robert H Lurie Children's Hospital of Chicago Research Center, Northwestern University's Feinberg School of Medicine 2430 N. Halsted St., Box 220 Chicago IL 60611 USA
- Northwestern University's Feinberg School of Medicine 2430 N. Halsted St., Box 220 Chicago IL 60611 USA
- MATTER Chicago 222 W. Merchandise Mart Plaza, Suite 12th Floor Chicago IL 60654 USA
- Genomic Enterprise (www.genomicenterprise.com) San Diego, CA 92008 and New York NY 11581 USA
| |
Collapse
|
30
|
Cammarata G, Duro G, Chiara TD, Curto AL, Taverna S, Candore G. Circulating miRNAs in Successful and Unsuccessful Aging. A Mini-review. Curr Pharm Des 2020; 25:4150-4153. [PMID: 31742494 DOI: 10.2174/1381612825666191119091644] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2019] [Accepted: 11/04/2019] [Indexed: 12/13/2022]
Abstract
Aging is a multifactorial process that affects the organisms at genetic, molecular and cellular levels. This process modifies several tissues with a negative impact on cells physiology, tissues and organs functionality, altering their regeneration capacity. The chronic low-grade inflammation typical of aging, defined as inflammaging, is a common biological factor responsible for the decline and beginning of the disease in age. A murine parabiosis model that combines the vascular system of old and young animals, suggests that soluble factors released by young individuals may improve the regenerative potential of old tissue. Therefore, circulating factors have a key role in the induction of aging phenotype. Moreover, lifestyle can influence the physiological status of multiple organs, via epigenetic mechanisms. Recently, microRNAs are considered potential sensors of aging.
Collapse
Affiliation(s)
- Giuseppe Cammarata
- Institute for Biomedical Research and Innovation, National Research Council of Italy, Palermo, Italy
| | - Giovanni Duro
- Institute for Biomedical Research and Innovation, National Research Council of Italy, Palermo, Italy
| | - Tiziana Di Chiara
- U.O.C di Medicina Interna con Stroke Care, Dipartimento Biomedico di Medicina Interna e Specialistica (Di.Bi.M.I.S), University of Palermo, Palermo, Italy
| | - Alessia Lo Curto
- Institute for Biomedical Research and Innovation, National Research Council of Italy, Palermo, Italy
| | - Simona Taverna
- Institute for Biomedical Research and Innovation, National Research Council of Italy, Palermo, Italy
| | - Giuseppina Candore
- Laboratory of Immunopathology and Immunosenescence, Department of Biomedicine, Neuroscience and Advanced Diagnostics, University of Palermo, Palermo, Italy
| |
Collapse
|
31
|
Boukelia A, Boucheham A, Belguidoum M, Batouche M, Zehraoui F, Tahi F. A Novel Integrative Approach for Non-coding RNA Classification Based on Deep Learning. Curr Bioinform 2020. [DOI: 10.2174/1574893614666191105160633] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Background:
Molecular biomarkers show new ways to understand many disease
processes. Noncoding RNAs as biomarkers play a crucial role in several cellular activities, which
are highly correlated to many human diseases especially cancer. The classification and the
identification of ncRNAs have become a critical issue due to their application, such as biomarkers
in many human diseases.
Objective:
Most existing computational tools for ncRNA classification are mainly used for
classifying only one type of ncRNA. They are based on structural information or specific known
features. Furthermore, these tools suffer from a lack of significant and validated features.
Therefore, the performance of these methods is not always satisfactory.
Methods:
We propose a novel approach named imCnC for ncRNA classification based on
multisource deep learning, which integrates several data sources such as genomic and epigenomic
data to identify several ncRNA types. Also, we propose an optimization technique to visualize the
extracted features pattern from the multisource CNN model to measure the epigenomics features
of each ncRNA type.
Results:
The computational results using a dataset of 16 human ncRNA classes downloaded from
RFAM show that imCnC outperforms the existing tools. Indeed, imCnC achieved an accuracy of
94,18%. In addition, our method enables to discover new ncRNA features using an optimization
technique to measure and visualize the features pattern of the imCnC classifier.
Collapse
Affiliation(s)
- Abdelbasset Boukelia
- Computer Science Department, Faculty NTIC, University Abdelhamid Mehri Constantine 2, Constantine 25000, Algeria
| | - Anouar Boucheham
- University Salah Boubnider Constantine 3, Constantine 25000, Algeria
| | - Meriem Belguidoum
- Computer Science Department, Faculty NTIC, University Abdelhamid Mehri Constantine 2, Constantine 25000, Algeria
| | - Mohamed Batouche
- IT Department, CCIS - RC, Princess Nourah University, Riyadh, Saudi Arabia
| | - Farida Zehraoui
- IBISC, University Evry, University Paris-Saclay, Evry, France
| | - Fariza Tahi
- IBISC, University Evry, University Paris-Saclay, Evry, France
| |
Collapse
|
32
|
CirRNAPL: A web server for the identification of circRNA based on extreme learning machine. Comput Struct Biotechnol J 2020; 18:834-842. [PMID: 32308930 PMCID: PMC7153170 DOI: 10.1016/j.csbj.2020.03.028] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2019] [Revised: 03/29/2020] [Accepted: 03/29/2020] [Indexed: 12/27/2022] Open
Abstract
Circular RNA (circRNA) plays an important role in the development of diseases, and it provides a novel idea for drug development. Accurate identification of circRNAs is important for a deeper understanding of their functions. In this study, we developed a new classifier, CirRNAPL, which extracts the features of nucleic acid composition and structure of the circRNA sequence and optimizes the extreme learning machine based on the particle swarm optimization algorithm. We compared CirRNAPL with existing methods, including blast, on three datasets and found CirRNAPL significantly improved the identification accuracy for the three datasets, with accuracies of 0.815, 0.802, and 0.782, respectively. Additionally, we performed sequence alignment on 564 sequences of the independent detection set of the third data set and analyzed the expression level of circRNAs. Results showed the expression level of the sequence is positively correlated with the abundance. A user-friendly CirRNAPL web server is freely available at http://server.malab.cn/CirRNAPL/.
Collapse
Key Words
- ACC, Accuracy
- CNN, Convolutional Neural Networks
- Circular RNA
- DAC, Dinucleotide-based auto-covariance
- DACC, Dinucleotide-based auto-cross-covariance
- DCC, Dinucleotide-based cross-covariance
- ELM, extreme learning machine
- Expression level
- Extreme learning machine
- GAC, Geary autocorrelation
- Identification
- MAC, Moran autocorrelation
- MCC, Matthews Correlation Coefficient
- MRMD, Maximum-Relevance-Maximum-Distance
- NMBAC, Normalized Moreau–Broto autocorrelation
- PC-PseDNC-General, General parallel correlation pseudo-dinucleotide composition
- PCGs, protein coding genes
- PSO, particle swarm optimization algorithm
- Particle swarm optimization algorithm
- PseDPC, Pseudo-distance structure status pair composition
- PseSSC, Pseudo-structure status composition
- RBF, radial basis function
- RF, random forest
- SC-PseDNC-General, General series correlation pseudo-dinucleotide composition
- SE, Sensitivity
- SP, Specifity
- SVM, support vector machine
- Triplet, Local structure-sequence triplet element
- circRNA, circular RNA
- lncRNAs, long non-coding RNAs
Collapse
|
33
|
Yoshino Y, Dwivedi Y. Non-Coding RNAs in Psychiatric Disorders and Suicidal Behavior. Front Psychiatry 2020; 11:543893. [PMID: 33101077 PMCID: PMC7522197 DOI: 10.3389/fpsyt.2020.543893] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/18/2020] [Accepted: 08/14/2020] [Indexed: 12/18/2022] Open
Abstract
It is well known that only a small proportion of the human genome code for proteins; the rest belong to the family of RNAs that do not code for protein and are known as non-coding RNAs (ncRNAs). ncRNAs are further divided into two subclasses based on size: 1) long non-coding RNAs (lncRNAs; >200 nucleotides) and 2) small RNAs (<200 nucleotides). Small RNAs contain various family members that include microRNAs (miRNAs), small interfering RNAs (siRNAs), piwi-interacting RNAs (piRNAs), small nucleolar RNAs (snoRNAs), and small nuclear RNAs (snRNAs). The roles of ncRNAs, especially lncRNAs and miRNAs, are well documented in brain development, homeostasis, stress responses, and neural plasticity. It has also been reported that ncRNAs can influence the development of psychiatric disorders including schizophrenia, major depressive disorder, and bipolar disorder. More recently, their roles are being investigated in suicidal behavior. In this article, we have comprehensively reviewed the findings of lncRNA and miRNA expression changes and their functions in various psychiatric disorders including suicidal behavior. We primarily focused on studies that have been done in postmortem human brain. In addition, we have briefly reviewed the role of other small RNAs (e.g. piwiRNA, siRNA, snRNA, and snoRNAs) and their expression changes in psychiatric illnesses.
Collapse
Affiliation(s)
- Yuta Yoshino
- Department of Psychiatry and Behavioral Neurobiology, University of Alabama at Birmingham, Birmingham, AL, United States
| | - Yogesh Dwivedi
- Department of Psychiatry and Behavioral Neurobiology, University of Alabama at Birmingham, Birmingham, AL, United States
| |
Collapse
|
34
|
Bakhshayesh NM, Shamsi M, Sedaaghi MH, Ebrahimnezhad H. Alignment of Noncoding Ribonucleic Acids with Pseudoknots Using Context-Sensitive Hidden Markov Model. JOURNAL OF MEDICAL SIGNALS & SENSORS 2019; 9:252-258. [PMID: 31737554 PMCID: PMC6839439 DOI: 10.4103/jmss.jmss_11_19] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2019] [Revised: 03/22/2019] [Accepted: 05/15/2019] [Indexed: 12/04/2022]
Abstract
Up to now, various signal processing techniques have been used to predict protein-coding genes that are unsuitable for predicting ribonucleic acids (RNAs). Modeling a gene network can be employed in various fields, such as the discovery of new drugs, reducing the side effects of treatment methods, further identifying genetic diseases and treatments for genetic disorders by influencing the activity of effectual genes, preventing the growth of unwanted tissues via growth weakening and cell reproduction, and also for many other applications in the fields of medicine and agriculture. The main purpose of this study was to design a suitable algorithm based on context-sensitive hidden Markov models (csHMMs) for the alignment of secondary structures of RNAs, which can identify noncoding RNAs. In this model, several RNA families are compared, and their existing similarities are measured. An expectation–maximization algorithm is used to estimate the model's parameters. This algorithm is the standard algorithm to maximize HMM parameters. The alignment results for RNAs belonging to the hepatitis delta virus family showed an accuracy of 83.33%, a specificity of 89%, and a sensitivity of 97%, and RNAs belonging to the purine family showed an accuracy of 65%, a specificity of 76%, and a sensitivity of 76%. The results show that csHMMs, in addition to aligning the primary sequences of RNAs, would align the secondary structures of RNAs with high accuracy.
Collapse
Affiliation(s)
| | - Mousa Shamsi
- Faculty of Biomedical Engineering, Sahand University of Technology, Tabriz, Iran
| | | | | |
Collapse
|
35
|
Motawi TK, Mady AE, Shaheen S, Elshenawy SZ, Talaat RM, Rizk SM. Genetic variation in microRNA-100 (miR-100) rs1834306 T/C associated with Hepatitis B virus (HBV) infection: Correlation with expression level. INFECTION GENETICS AND EVOLUTION 2019; 73:444-449. [PMID: 31176032 DOI: 10.1016/j.meegid.2019.06.009] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/01/2019] [Revised: 05/15/2019] [Accepted: 06/04/2019] [Indexed: 02/07/2023]
Abstract
Circulating microRNAs (miRNAs) have a vital role in Hepatitis B virus (HBV) diagnosis and therapeutics. miR-100 was reported to be associated with various aspects of HBV biology. This study focused on a miR-100 Single Nucleotide Polymorphism (SNP) (rs1834306 T/C) and its contribution to an individual's susceptibility and prognosis of HBV infection. The effect of SNP on miR-100 expression will be also evaluated. Two hundred subjects: 100 HBV infected patients and 100 age-and-sex-matched healthy individuals served as a control group. SNP detection was performed using polymerase chain reaction technique with sequence-specific primers (PCR-SSP) method and miR-100 expression through quantitative real-time PCR (qRT-PCR). Our result showed a significant up-regulation of miR-100 expression in HBV patients versus the control group (P < .01). A positive correlation was found between viral load and elevation in miR-100 expression (r = 0.508; P < .01). Concerning miR-100 expression in different genotypes/alleles, TC genotype and T allele in coincides with a significantly elevated expression level of miR-100 (P < .001) in HBV patients than in controls. Best of our knowledge, it is the first observational prospective case-control study concerned with miR-100 (rs1834306 T/C) SNP in the Egyptian population. However, the small size of this preliminary work required more prospective investigations to confirm our data.
Collapse
Affiliation(s)
- Tarek K Motawi
- Biochemistry Department, Faculty of Pharmacy, Cairo University, Egypt.
| | - Amira E Mady
- Biochemistry Department, Faculty of Pharmacy, Cairo University, Egypt; Pharmacy Department, National Liver Institute, Menoufia University, Egypt.
| | - Samar Shaheen
- Molecular Biology Department, Genetic Engineering and Biotechnology Research Institute (GEBRI), University of Sadat City (USC), Egypt.
| | - Soha Z Elshenawy
- Clinical Biochemistry and Molecular Diagnostics Department, National Liver Institute, Menoufia University, Egypt.
| | - Roba M Talaat
- Molecular Biology Department, Genetic Engineering and Biotechnology Research Institute (GEBRI), University of Sadat City (USC), Egypt.
| | - Sherine M Rizk
- Biochemistry Department, Faculty of Pharmacy, Cairo University, Egypt.
| |
Collapse
|
36
|
Jha CK, Mir R, Elfaki I, Khullar N, Rehman S, Javid J, Banu S, Chahal SMS. Potential Impact of MicroRNA-423 Gene Variability in Coronary Artery Disease. Endocr Metab Immune Disord Drug Targets 2019; 19:67-74. [PMID: 30289085 DOI: 10.2174/1871530318666181005095724] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/31/2018] [Revised: 07/02/2018] [Accepted: 08/27/2018] [Indexed: 01/16/2023]
Abstract
AIM Studies have evaluated the association of miRNA-423 C>A genotyping with the susceptibility to various diseases such cancers, atherosclerosis and inflammatory bowel disease but the results were contradictory. However, no studies have reported the association between miRNA-423 rs6505162 C>A polymorphism and susceptibility of coronary artery disease. MicroRNAs regulate expression of multiple genes involved in atherogenesis. Therefore, we investigated the association of microRNA-423C>T gene variations with susceptibility to coronary artery disease. METHODOLOGY This study was conducted on 100 coronary artery disease patients and 117 matched healthy controls. The genotyping of the microRNA-423 rs6505162C>A was performed by using Amplification refractory mutation system PCR method (ARMS-PCR). RESULTS A significant difference was observed in the genotype distribution among the coronary artery disease cases and sex-matched healthy controls (P=0.048). The frequencies of all three genotypes CC, CA, AA reported in the patient's samples were 55%, 41% and 4% and in the healthy controls samples were 55%, 41% and 4% respectively. Our findings showed that the microRNA-423 C>A variant was associated with an increased risk of coronary artery disease in codominant model (OR = 1.96, 95 % CI, 1.12-3.42; RR 1.35(1.05-1.75, p=0.017) of microRNA-423CA genotype and significant association in dominant model (OR 1.97, 95% CI (1.14-3.39), (CA+AA vs CC) and non-significant association for recessive model (OR=1.42, 95%CI=0.42-4.83, P=0.56, AA vs CC+CA).While, the A allele significantly increased the risk of coronary artery disease (OR =1.56, 95 % CI, 1.03-2.37; p=0.035) compared to C allele. Therefore, it was observed that more than 1.96, 1.97 and 1.56 fold increased risk of developing coronary artery disease. CONCLUSION Our findings indicated that microRNA-423 CA genotype and A allele are associated with an increased susceptibility to Coronary artery disease.
Collapse
Affiliation(s)
- Chandan K Jha
- Department of Human Genetics Punjabi University, Punjab, India
| | - Rashid Mir
- Department of Medical Lab Technology, Faculty of Applied Medical Sciences, University of Tabuk, Saudi Arabia
| | - Imadeldin Elfaki
- Department of Biochemistry, Faculty of Science, University of Tabuk, Saudi Arabia
| | | | - Suriya Rehman
- Institute of Research and Medical Consultation, Imam Abdulrahman Bin Faisal University,Dammam, Saudi Arabia
| | - Jamsheed Javid
- Department of Medical Lab Technology, Faculty of Applied Medical Sciences, University of Tabuk, Saudi Arabia
| | - Shaheena Banu
- Sri Jayadeva Institute of Cardiovascular science & Research, Bangalore, India
| | | |
Collapse
|
37
|
Amin N, McGrath A, Chen YPP. Evaluation of deep learning in non-coding RNA classification. NAT MACH INTELL 2019. [DOI: 10.1038/s42256-019-0051-2] [Citation(s) in RCA: 67] [Impact Index Per Article: 13.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
|
38
|
Babarinde IA, Li Y, Hutchins AP. Computational Methods for Mapping, Assembly and Quantification for Coding and Non-coding Transcripts. Comput Struct Biotechnol J 2019; 17:628-637. [PMID: 31193391 PMCID: PMC6526290 DOI: 10.1016/j.csbj.2019.04.012] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2019] [Revised: 04/24/2019] [Accepted: 04/29/2019] [Indexed: 12/17/2022] Open
Abstract
The measurement of gene expression has long provided significant insight into biological functions. The development of high-throughput short-read sequencing technology has revealed transcriptional complexity at an unprecedented scale, and informed almost all areas of biology. However, as researchers have sought to gather more insights from the data, these new technologies have also increased the computational analysis burden. In this review, we describe typical computational pipelines for RNA-Seq analysis and discuss their strengths and weaknesses for the assembly, quantification and analysis of coding and non-coding RNAs. We also discuss the assembly of transposable elements into transcripts, and the difficulty these repetitive elements pose. In summary, RNA-Seq is a powerful technology that is likely to remain a key asset in the biologist's toolkit.
Collapse
Affiliation(s)
- Isaac A Babarinde
- Department of Biology, Southern University of Science and Technology, 1088 Xueyuan Lu, Shenzhen, China
| | - Yuhao Li
- Department of Biology, Southern University of Science and Technology, 1088 Xueyuan Lu, Shenzhen, China
| | - Andrew P Hutchins
- Department of Biology, Southern University of Science and Technology, 1088 Xueyuan Lu, Shenzhen, China
| |
Collapse
|
39
|
Groher AC, Jager S, Schneider C, Groher F, Hamacher K, Suess B. Tuning the Performance of Synthetic Riboswitches using Machine Learning. ACS Synth Biol 2019; 8:34-44. [PMID: 30513199 DOI: 10.1021/acssynbio.8b00207] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Riboswitch development for clinical, technological, and synthetic biology applications constantly seeks to optimize regulatory behavior. Here, we present a machine learning approach to improve the regulation of a tetracycline (tc)-dependent riboswitch device composed of two individual tc aptamers. We developed a bioinformatics model that combines random forest analysis with a convolutional neural network to predict the switching behavior of such tandem riboswitches. We found that both biophysical parameters and the hydrogen bond pattern influence regulation. Our new design pipeline led to significant improvement of the tc riboswitch device with a dynamic range extension from 8.5 to 40-fold. We are confident that our novel method not only results in an excellent tc-dependent riboswitch device but further holds great promise and potential for the optimization of other riboswitches.
Collapse
|
40
|
Simopoulos CMA, Weretilnyk EA, Golding GB. Prediction of plant lncRNA by ensemble machine learning classifiers. BMC Genomics 2018; 19:316. [PMID: 29720103 PMCID: PMC5930664 DOI: 10.1186/s12864-018-4665-2] [Citation(s) in RCA: 38] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2017] [Accepted: 04/12/2018] [Indexed: 02/06/2023] Open
Abstract
Background In plants, long non-protein coding RNAs are believed to have essential roles in development and stress responses. However, relative to advances on discerning biological roles for long non-protein coding RNAs in animal systems, this RNA class in plants is largely understudied. With comparatively few validated plant long non-coding RNAs, research on this potentially critical class of RNA is hindered by a lack of appropriate prediction tools and databases. Supervised learning models trained on data sets of mostly non-validated, non-coding transcripts have been previously used to identify this enigmatic RNA class with applications largely focused on animal systems. Our approach uses a training set comprised only of empirically validated long non-protein coding RNAs from plant, animal, and viral sources to predict and rank candidate long non-protein coding gene products for future functional validation. Results Individual stochastic gradient boosting and random forest classifiers trained on only empirically validated long non-protein coding RNAs were constructed. In order to use the strengths of multiple classifiers, we combined multiple models into a single stacking meta-learner. This ensemble approach benefits from the diversity of several learners to effectively identify putative plant long non-coding RNAs from transcript sequence features. When the predicted genes identified by the ensemble classifier were compared to those listed in GreeNC, an established plant long non-coding RNA database, overlap for predicted genes from Arabidopsis thaliana, Oryza sativa and Eutrema salsugineum ranged from 51 to 83% with the highest agreement in Eutrema salsugineum. Most of the highest ranking predictions from Arabidopsis thaliana were annotated as potential natural antisense genes, pseudogenes, transposable elements, or simply computationally predicted hypothetical protein. Due to the nature of this tool, the model can be updated as new long non-protein coding transcripts are identified and functionally verified. Conclusions This ensemble classifier is an accurate tool that can be used to rank long non-protein coding RNA predictions for use in conjunction with gene expression studies. Selection of plant transcripts with a high potential for regulatory roles as long non-protein coding RNAs will advance research in the elucidation of long non-protein coding RNA function. Electronic supplementary material The online version of this article (10.1186/s12864-018-4665-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | | | - G Brian Golding
- Department of Biology, McMaster University, 1280 Main Street West, Hamilton, Canada.
| |
Collapse
|