1
|
Barro-Trastoy D, Köhler C. Helitrons: genomic parasites that generate developmental novelties. Trends Genet 2024; 40:437-448. [PMID: 38429198 DOI: 10.1016/j.tig.2024.02.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2023] [Revised: 02/03/2024] [Accepted: 02/05/2024] [Indexed: 03/03/2024]
Abstract
Helitrons, classified as DNA transposons, employ rolling-circle intermediates for transposition. Distinguishing themselves from other DNA transposons, they leave the original template element unaltered during transposition, which has led to their characterization as 'peel-and-paste elements'. Helitrons possess the ability to capture and mobilize host genome fragments, with enormous consequences for host genomes. This review discusses the current understanding of Helitrons, exploring their origins, transposition mechanism, and the extensive repercussions of their activity on genome structure and function. We also explore the evolutionary conflicts stemming from Helitron-transposed gene fragments and elucidate their domestication for regulating responses to environmental challenges. Looking ahead, further research in this evolving field promises to bring interesting discoveries on the role of Helitrons in shaping genomic landscapes.
Collapse
Affiliation(s)
- Daniela Barro-Trastoy
- Max Planck Institute of Molecular Plant Physiology, Am Mühlenberg 1, 14476 Potsdam-Golm, Germany
| | - Claudia Köhler
- Max Planck Institute of Molecular Plant Physiology, Am Mühlenberg 1, 14476 Potsdam-Golm, Germany; Department of Plant Biology, Swedish University of Agricultural Sciences and Linnean Center for Plant Biology, Uppsala 75007, Sweden.
| |
Collapse
|
2
|
Hamed BA, Ibrahim OAS, Abd El-Hafeez T. Optimizing classification efficiency with machine learning techniques for pattern matching. JOURNAL OF BIG DATA 2023; 10:124. [DOI: 10.1186/s40537-023-00804-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Accepted: 07/16/2023] [Indexed: 09/02/2023]
Abstract
AbstractThe study proposes a novel model for DNA sequence classification that combines machine learning methods and a pattern-matching algorithm. This model aims to effectively categorize DNA sequences based on their features and enhance the accuracy and efficiency of DNA sequence classification. The performance of the proposed model is evaluated using various machine learning algorithms, and the results indicate that the SVM linear classifier achieves the highest accuracy and F1 score among the tested algorithms. This finding suggests that the proposed model can provide better overall performance than other algorithms in DNA sequence classification. In addition, the proposed model is compared to two suggested algorithms, namely FLPM and PAPM, and the results show that the proposed model outperforms these algorithms in terms of accuracy and efficiency. The study further explores the impact of pattern length on the accuracy and time complexity of each algorithm. The results show that as the pattern length increases, the execution time of each algorithm varies. For a pattern length of 5, SVM Linear and EFLPM have the lowest execution time of 0.0035 s. However, at a pattern length of 25, SVM Linear has the lowest execution time of 0.0012 s. The experimental results of the proposed model show that SVM Linear has the highest accuracy and F1 score among the tested algorithms. SVM Linear achieved an accuracy of 0.963 and an F1 score of 0.97, indicating that it can provide the best overall performance in DNA sequence classification. Naive Bayes also performs well with an accuracy of 0.838 and an F1 score of 0.94. The proposed model offers a valuable contribution to the field of DNA sequence analysis by providing a novel approach to pre-processing and feature extraction. The model’s potential applications include drug discovery, personalized medicine, and disease diagnosis. The study’s findings highlight the importance of considering the impact of pattern length on the accuracy and time complexity of DNA sequence classification algorithms.
Collapse
|
3
|
Soto I, Zamorano-Illanes R, Becerra R, Palacios Játiva P, Azurdia-Meza CA, Alavia W, García V, Ijaz M, Zabala-Blanco D. A New COVID-19 Detection Method Based on CSK/QAM Visible Light Communication and Machine Learning. SENSORS (BASEL, SWITZERLAND) 2023; 23:1533. [PMID: 36772574 DOI: 10.3390/s23031533] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Revised: 01/23/2023] [Accepted: 01/26/2023] [Indexed: 06/18/2023]
Abstract
This article proposes a novel method for detecting coronavirus disease 2019 (COVID-19) in an underground channel using visible light communication (VLC) and machine learning (ML). We present mathematical models of COVID-19 Deoxyribose Nucleic Acid (DNA) gene transfer in regular square constellations using a CSK/QAM-based VLC system. ML algorithms are used to classify the bands present in each electrophoresis sample according to whether the band corresponds to a positive, negative, or ladder sample during the search for the optimal model. Complexity studies reveal that the square constellation N=22i×22i,(i=3) yields a greater profit. Performance studies indicate that, for BER = 10-3, there are gains of -10 [dB], -3 [dB], 3 [dB], and 5 [dB] for N=22i×22i,(i=0,1,2,3), respectively. Based on a total of 630 COVID-19 samples, the best model is shown to be XGBoots, which demonstrated an accuracy of 96.03%, greater than that of the other models, and a recall of 99% for positive values.
Collapse
Affiliation(s)
- Ismael Soto
- CIMTT, Department of Electrical Engineering, Universidad de Santiago de Chile, Santiago 9170124, Chile
| | - Raul Zamorano-Illanes
- CIMTT, Department of Electrical Engineering, Universidad de Santiago de Chile, Santiago 9170124, Chile
| | - Raimundo Becerra
- Department of Electrical Engineering, Universidad de Chile, Santiago 8370451, Chile
| | - Pablo Palacios Játiva
- Department of Electrical Engineering, Universidad de Chile, Santiago 8370451, Chile
- Escuela de Informática y Telecomunicaciones, Universidad Diego Portales, Santiago 8370190, Chile
| | - Cesar A Azurdia-Meza
- Department of Electrical Engineering, Universidad de Chile, Santiago 8370451, Chile
| | - Wilson Alavia
- CIMTT, Department of Electrical Engineering, Universidad de Santiago de Chile, Santiago 9170124, Chile
| | - Verónica García
- Departamento en Ciencia y Tecnología de los Alimentos, de la Universidad de Santiago de Chile, Santiago 9170124, Chile
| | - Muhammad Ijaz
- Manchester Metropolitan University, Manchester M1 5GD, UK
| | - David Zabala-Blanco
- Department of Computer Science and Industry, Universidad Católica del Maule, Talca 3480112, Chile
| |
Collapse
|
4
|
Paul T, Vainio S, Roning J. Detection of intra-family coronavirus genome sequences through graphical representation and artificial neural network. EXPERT SYSTEMS WITH APPLICATIONS 2022; 194:116559. [PMID: 35095217 PMCID: PMC8779865 DOI: 10.1016/j.eswa.2022.116559] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/20/2021] [Revised: 12/29/2021] [Accepted: 01/16/2022] [Indexed: 05/06/2023]
Abstract
In this study, chaos game representation (CGR) is introduced for investigating the pattern of genome sequences. It is an image representation of the genome for the overall visualization of the sequence. The CGR representation is a mapping technique that assigns each sequence base into the respective position in the two-dimension plane to portray the DNA sequence. Importantly, CGR provides one to one mapping to nucleotides as well as sequence. A coordinate of the CGR plane can tell the corresponding base and its location in the original genome. Therefore, the whole nucleotide sequence (until the current nucleotide) can be restored from the one point of the CGR. In this study, CGR coupled with artificial neural network (ANN) is introduced as a new way to represent the genome and to classify intra-coronavirus sequences. A hierarchy clustering study is done to validate the approach and found to be more than 90% accurate while comparing the result with the phylogenetic tree of the corresponding genomes. Interestingly, the method makes the genome sequence significantly shorter (more than 99% compressed) saving the data space while preserving the genome features.
Collapse
Affiliation(s)
- Tirthankar Paul
- InfoTech Oulu, Faculty of Information Technology and Electrical Engineering, Biomimetics and Intelligent Systems Group (BISG), University of Oulu, Oulu, Finland
| | - Seppo Vainio
- Infotech Oulu and Kvantum Institute, Faculty of Biochemistry and Molecular Medicine, Disease Networks, University of Oulu, Oulu, Finland
| | - Juha Roning
- InfoTech Oulu, Faculty of Information Technology and Electrical Engineering, Biomimetics and Intelligent Systems Group (BISG), University of Oulu, Oulu, Finland
| |
Collapse
|
5
|
Alyasseri ZAA, Al‐Betar MA, Doush IA, Awadallah MA, Abasi AK, Makhadmeh SN, Alomari OA, Abdulkareem KH, Adam A, Damasevicius R, Mohammed MA, Zitar RA. Review on COVID-19 diagnosis models based on machine learning and deep learning approaches. EXPERT SYSTEMS 2022; 39:e12759. [PMID: 34511689 PMCID: PMC8420483 DOI: 10.1111/exsy.12759] [Citation(s) in RCA: 53] [Impact Index Per Article: 26.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Revised: 05/17/2021] [Accepted: 06/07/2021] [Indexed: 05/02/2023]
Abstract
COVID-19 is the disease evoked by a new breed of coronavirus called the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Recently, COVID-19 has become a pandemic by infecting more than 152 million people in over 216 countries and territories. The exponential increase in the number of infections has rendered traditional diagnosis techniques inefficient. Therefore, many researchers have developed several intelligent techniques, such as deep learning (DL) and machine learning (ML), which can assist the healthcare sector in providing quick and precise COVID-19 diagnosis. Therefore, this paper provides a comprehensive review of the most recent DL and ML techniques for COVID-19 diagnosis. The studies are published from December 2019 until April 2021. In general, this paper includes more than 200 studies that have been carefully selected from several publishers, such as IEEE, Springer and Elsevier. We classify the research tracks into two categories: DL and ML and present COVID-19 public datasets established and extracted from different countries. The measures used to evaluate diagnosis methods are comparatively analysed and proper discussion is provided. In conclusion, for COVID-19 diagnosing and outbreak prediction, SVM is the most widely used machine learning mechanism, and CNN is the most widely used deep learning mechanism. Accuracy, sensitivity, and specificity are the most widely used measurements in previous studies. Finally, this review paper will guide the research community on the upcoming development of machine learning for COVID-19 and inspire their works for future development. This review paper will guide the research community on the upcoming development of ML and DL for COVID-19 and inspire their works for future development.
Collapse
Affiliation(s)
- Zaid Abdi Alkareem Alyasseri
- Center for Artificial Intelligence Technology, Faculty of Information Science and TechnologyUniversiti Kebangsaan MalaysiaBangiMalaysia
- ECE Department‐Faculty of EngineeringUniversity of KufaNajafIraq
| | - Mohammed Azmi Al‐Betar
- Artificial Intelligence Research Center (AIRC)Ajman UniversityAjmanUnited Arab Emirates
- Department of Information TechnologyAl‐Huson University College, Al‐Balqa Applied UniversityIrbidJordan
| | - Iyad Abu Doush
- Computing Department, College of Engineering and Applied SciencesAmerican University of KuwaitSalmiyaKuwait
- Computer Science DepartmentYarmouk UniversityIrbidJordan
| | - Mohammed A. Awadallah
- Artificial Intelligence Research Center (AIRC)Ajman UniversityAjmanUnited Arab Emirates
- Department of Computer ScienceAl‐Aqsa UniversityGazaPalestine
| | - Ammar Kamal Abasi
- Artificial Intelligence Research Center (AIRC)Ajman UniversityAjmanUnited Arab Emirates
- School of Computer SciencesUniversiti Sains MalaysiaPenangMalaysia
| | - Sharif Naser Makhadmeh
- Artificial Intelligence Research Center (AIRC)Ajman UniversityAjmanUnited Arab Emirates
- Faculty of Information TechnologyMiddle East UniversityAmmanJordan
| | | | | | - Afzan Adam
- Center for Artificial Intelligence Technology, Faculty of Information Science and TechnologyUniversiti Kebangsaan MalaysiaBangiMalaysia
| | | | - Mazin Abed Mohammed
- College of Computer Science and Information TechnologyUniversity of AnbarAnbarIraq
| | - Raed Abu Zitar
- Sorbonne Center of Artificial IntelligenceSorbonne University‐Abu DhabiAbu DhabiUnited Arab Emirates
| |
Collapse
|
6
|
Shekhar S, Garg H, Agrawal R, Shivani S, Sharma B. Hatred and trolling detection transliteration framework using hierarchical LSTM in code-mixed social media text. COMPLEX INTELL SYST 2021. [DOI: 10.1007/s40747-021-00487-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
AbstractThe paper describes the usage of self-learning Hierarchical LSTM technique for classifying hatred and trolling contents in social media code-mixed data. The Hierarchical LSTM-based learning is a novel learning architecture inspired from the neural learning models. The proposed HLSTM model is trained to identify the hatred and trolling words available in social media contents. The proposed HLSTM systems model is equipped with self-learning and predicting mechanism for annotating hatred words in transliteration domain. The Hindi–English data are ordered into Hindi, English, and hatred labels for classification. The mechanism of word embedding and character-embedding features are used here for word representation in the sentence to detect hatred words. The method developed based on HLSTM model helps in recognizing the hatred word context by mining the intention of the user for using that word in the sentence. Wide experiments suggests that the HLSTM-based classification model gives the accuracy of 97.49% when evaluated against the standard parameters like BLSTM, CRF, LR, SVM, Random Forest and Decision Tree models especially when there are some hatred and trolling words in the social media data.
Collapse
|
7
|
|
8
|
Han GS, Li Q, Li Y. Comparative analysis and prediction of nucleosome positioning using integrative feature representation and machine learning algorithms. BMC Bioinformatics 2021; 22:129. [PMID: 34078256 PMCID: PMC8170966 DOI: 10.1186/s12859-021-04006-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2021] [Accepted: 02/08/2021] [Indexed: 12/01/2022] Open
Abstract
Background Nucleosome plays an important role in the process of genome expression, DNA replication, DNA repair and transcription. Therefore, the research of nucleosome positioning has invariably received extensive attention. Considering the diversity of DNA sequence representation methods, we tried to integrate multiple features to analyze its effect in the process of nucleosome positioning analysis. This process can also deepen our understanding of the theoretical analysis of nucleosome positioning. Results Here, we not only used frequency chaos game representation (FCGR) to construct DNA sequence features, but also integrated it with other features and adopted the principal component analysis (PCA) algorithm. Simultaneously, support vector machine (SVM), extreme learning machine (ELM), extreme gradient boosting (XGBoost), multilayer perceptron (MLP) and convolutional neural networks (CNN) are used as predictors for nucleosome positioning prediction analysis, respectively. The integrated feature vector prediction quality is significantly superior to a single feature. After using principal component analysis (PCA) to reduce the feature dimension, the prediction quality of H. sapiens dataset has been significantly improved. Conclusions Comparative analysis and prediction on H. sapiens, C. elegans, D. melanogaster and S. cerevisiae datasets, demonstrate that the application of FCGR to nucleosome positioning is feasible, and we also found that integrative feature representation would be better.
Collapse
Affiliation(s)
- Guo-Sheng Han
- Department of Mathematics and Computational Science, Xiangtan University, Xiangtan, 411105, Hunan, China. .,Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education and Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Xiangtan University, Xiangtan, 411105, Hunan, China.
| | - Qi Li
- Department of Mathematics and Computational Science, Xiangtan University, Xiangtan, 411105, Hunan, China.,Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education and Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Xiangtan University, Xiangtan, 411105, Hunan, China
| | - Ying Li
- Department of Mathematics and Computational Science, Xiangtan University, Xiangtan, 411105, Hunan, China.,Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education and Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Xiangtan University, Xiangtan, 411105, Hunan, China
| |
Collapse
|
9
|
|