Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Georgakilas GK, Perdikopanis N, Hatzigeorgiou A. Solving the transcription start site identification problem with ADAPT-CAGE: a Machine Learning algorithm for the analysis of CAGE data. Sci Rep 2020;10:877. [PMID: 31965016 PMCID: PMC6972925 DOI: 10.1038/s41598-020-57811-3] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2019] [Accepted: 12/18/2019] [Indexed: 11/20/2022] Open

For:	Georgakilas GK, Perdikopanis N, Hatzigeorgiou A. Solving the transcription start site identification problem with ADAPT-CAGE: a Machine Learning algorithm for the analysis of CAGE data. Sci Rep 2020;10:877. [PMID: 31965016 PMCID: PMC6972925 DOI: 10.1038/s41598-020-57811-3] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2019] [Accepted: 12/18/2019] [Indexed: 11/20/2022] Open

Number

Cited by Other Article(s)

Wei H, Liang L, Song C, Tong M, Xu X. Regulatory role and molecular mechanism of METTL14 in vascular endothelial cell injury in preeclampsia. BIOMOLECULES & BIOMEDICINE 2025;25:682-692. [PMID: 39319864 PMCID: PMC12010980 DOI: 10.17305/bb.2024.10963] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/09/2024] [Revised: 08/24/2024] [Accepted: 08/24/2024] [Indexed: 09/26/2024]

Athanasopoulou K, Chondrou V, Xiropotamos P, Psarias G, Vasilopoulos Y, Georgakilas GK, Sgourou A. Transcriptional repression of lncRNA and miRNA subsets mediated by LRF during erythropoiesis. J Mol Med (Berl) 2023;101:1097-1112. [PMID: 37486375 PMCID: PMC10482784 DOI: 10.1007/s00109-023-02352-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Revised: 07/10/2023] [Accepted: 07/12/2023] [Indexed: 07/25/2023]

Abstract

Non-coding RNA (ncRNA) species, mainly long non-coding RNAs (lncRNAs) and microRNAs (miRNAs) have been currently imputed for lesser or greater involvement in human erythropoiesis. These RNA subsets operate within a complex circuit with other epigenetic components and transcription factors (TF) affecting chromatin remodeling during cell differentiation. Lymphoma/leukemia-related (LRF) TF exerts higher occupancy on DNA CpG rich sites and is implicated in several differentiation cell pathways and erythropoiesis among them and also directs the epigenetic regulation of hemoglobin transversion from fetal (HbF) to adult (HbA) form by intervening in the γ-globin gene repression. We intended to investigate LRF activity in the evolving landscape of cells' commitment to the erythroid lineage and specifically during HbF to HbA transversion, to qualify this TF as potential repressor of lncRNAs and miRNAs. Transgenic human erythroleukemia cells, overexpressing LRF and further induced to erythropoiesis, were subjected to expression analysis in high LRF occupancy genetic loci-producing lncRNAs. LRF abundance in genetic loci transcribing for studied lncRNAs was determined by ChIP-Seq data analysis. qPCRs were performed to examine lncRNA expression status. Differentially expressed miRNA pre- and post-erythropoiesis induction were assessed by next-generation sequencing (NGS), and their promoter regions were charted. Expression levels of lncRNAs were correlated with DNA methylation status of flanked CpG islands, and contingent co-regulation of hosted miRNAs was considered. LRF-binding sites were overrepresented in LRF overexpressing cell clones during erythropoiesis induction and exerted a significant suppressive effect towards lncRNAs and miRNA collections. Based on present data interpretation, LRF's multiplied binding capacity across genome is suggested to be transient and associated with higher levels of DNA methylation. KEY MESSAGES: During erythropoiesis, LRF displays extensive occupancy across genetic loci. LRF significantly represses subsets of lncRNAs and miRNAs during erythropoiesis. Promoter region CpG islands' methylation levels affect lncRNA expression. MiRNAs embedded within lncRNA loci show differential regulation of expression.

Collapse

Zaytsev K, Fedorov A, Korotkov E. Classification of Promoter Sequences from Human Genome. Int J Mol Sci 2023;24:12561. [PMID: 37628742 PMCID: PMC10454140 DOI: 10.3390/ijms241612561] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Revised: 07/28/2023] [Accepted: 08/03/2023] [Indexed: 08/27/2023] Open

Barbero-Aparicio JA, Olivares-Gil A, Díez-Pastor JF, García-Osorio C. Deep learning and support vector machines for transcription start site identification. PeerJ Comput Sci 2023;9:e1340. [PMID: 37346545 PMCID: PMC10280436 DOI: 10.7717/peerj-cs.1340] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2022] [Accepted: 03/21/2023] [Indexed: 06/23/2023]

Abstract

Recognizing transcription start sites is key to gene identification. Several approaches have been employed in related problems such as detecting translation initiation sites or promoters, many of the most recent ones based on machine learning. Deep learning methods have been proven to be exceptionally effective for this task, but their use in transcription start site identification has not yet been explored in depth. Also, the very few existing works do not compare their methods to support vector machines (SVMs), the most established technique in this area of study, nor provide the curated dataset used in the study. The reduced amount of published papers in this specific problem could be explained by this lack of datasets. Given that both support vector machines and deep neural networks have been applied in related problems with remarkable results, we compared their performance in transcription start site predictions, concluding that SVMs are computationally much slower, and deep learning methods, specially long short-term memory neural networks (LSTMs), are best suited to work with sequences than SVMs. For such a purpose, we used the reference human genome GRCh38. Additionally, we studied two different aspects related to data processing: the proper way to generate training examples and the imbalanced nature of the data. Furthermore, the generalization performance of the models studied was also tested using the mouse genome, where the LSTM neural network stood out from the rest of the algorithms. To sum up, this article provides an analysis of the best architecture choices in transcription start site identification, as well as a method to generate transcription start site datasets including negative instances on any species available in Ensembl. We found that deep learning methods are better suited than SVMs to solve this problem, being more efficient and better adapted to long sequences and large amounts of data. We also create a transcription start site (TSS) dataset large enough to be used in deep learning experiments.

Collapse

Patra P, B R D, Kundu P, Das M, Ghosh A. Recent advances in machine learning applications in metabolic engineering. Biotechnol Adv 2023;62:108069. [PMID: 36442697 DOI: 10.1016/j.biotechadv.2022.108069] [Citation(s) in RCA: 26] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2022] [Revised: 10/18/2022] [Accepted: 11/22/2022] [Indexed: 11/27/2022]

Abstract

Metabolic engineering encompasses several widely-used strategies, which currently hold a high seat in the field of biotechnology when its potential is manifesting through a plethora of research and commercial products with a strong societal impact. The genomic revolution that occurred almost three decades ago has initiated the generation of large omics-datasets which has helped in gaining a better understanding of cellular behavior. The itinerary of metabolic engineering that has occurred based on these large datasets has allowed researchers to gain detailed insights and a reasonable understanding of the intricacies of biosystems. However, the existing trail-and-error approaches for metabolic engineering are laborious and time-intensive when it comes to the production of target compounds with high yields through genetic manipulations in host organisms. Machine learning (ML) coupled with the available metabolic engineering test instances and omics data brings a comprehensive and multidisciplinary approach that enables scientists to evaluate various parameters for effective strain design. This vast amount of biological data should be standardized through knowledge engineering to train different ML models for providing accurate predictions in gene circuits designing, modification of proteins, optimization of bioprocess parameters for scaling up, and screening of hyper-producing robust cell factories. This review briefs on the premise of ML, followed by mentioning various ML methods and algorithms alongside the numerous omics datasets available to train ML models for predicting metabolic outcomes with high-accuracy. The combinative interplay between the ML algorithms and biological datasets through knowledge engineering have guided the recent advancements in applications such as CRISPR/Cas systems, gene circuits, protein engineering, metabolic pathway reconstruction, and bioprocess engineering. Finally, this review addresses the probable challenges of applying ML in metabolic engineering which will guide the researchers toward novel techniques to overcome the limitations.

Collapse

Barbero-Aparicio JA, Cuesta-Lopez S, García-Osorio CI, Pérez-Rodríguez J, García-Pedrajas N. Nonlinear physics opens a new paradigm for accurate transcription start site prediction. BMC Bioinformatics 2022;23:565. [PMID: 36585618 PMCID: PMC9801560 DOI: 10.1186/s12859-022-05129-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2022] [Accepted: 12/27/2022] [Indexed: 12/31/2022] Open

Grigoriadis D, Perdikopanis N, Georgakilas GK, Hatzigeorgiou AG. DeepTSS: multi-branch convolutional neural network for transcription start site identification from CAGE data. BMC Bioinformatics 2022;23:395. [PMID: 36510136 PMCID: PMC9743497 DOI: 10.1186/s12859-022-04945-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Accepted: 09/16/2022] [Indexed: 12/14/2022] Open

Abstract

BACKGROUND

The widespread usage of Cap Analysis of Gene Expression (CAGE) has led to numerous breakthroughs in understanding the transcription mechanisms. Recent evidence in the literature, however, suggests that CAGE suffers from transcriptional and technical noise. Regardless of the sample quality, there is a significant number of CAGE peaks that are not associated with transcription initiation events. This type of signal is typically attributed to technical noise and more frequently to random five-prime capping or transcription bioproducts. Thus, the need for computational methods emerges, that can accurately increase the signal-to-noise ratio in CAGE data, resulting in error-free transcription start site (TSS) annotation and quantification of regulatory region usage. In this study, we present DeepTSS, a novel computational method for processing CAGE samples, that combines genomic signal processing (GSP), structural DNA features, evolutionary conservation evidence and raw DNA sequence with Deep Learning (DL) to provide single-nucleotide TSS predictions with unprecedented levels of performance.

RESULTS

To evaluate DeepTSS, we utilized experimental data, protein-coding gene annotations and computationally-derived genome segmentations by chromatin states. DeepTSS was found to outperform existing algorithms on all benchmarks, achieving 98% precision and 96% sensitivity (accuracy 95.4%) on the protein-coding gene strategy, with 96.66% of its positive predictions overlapping active chromatin, 98.27% and 92.04% co-localized with at least one transcription factor and H3K4me3 peak.

CONCLUSIONS

CAGE is a key protocol in deciphering the language of transcription, however, as every experimental protocol, it suffers from biological and technical noise that can severely affect downstream analyses. DeepTSS is a novel DL-based method for effectively removing noisy CAGE signal. In contrast to existing software, DeepTSS does not require feature selection since the embedded convolutional layers can readily identify patterns and only utilize the important ones for the classification task. This study highlights the key role that DL can play in Molecular Biology, by removing the inherent flaws of experimental protocols, that form the backbone of contemporary research. Here, we show how DeepTSS can unleash the full potential of an already popular and mature method such as CAGE, and push the boundaries of coding and non-coding gene expression regulator research even further.

Collapse

Liu Q, Fang H, Wang X, Wang M, Li S, Coin LJM, Li F, Song J. DeepGenGrep: a general deep learning-based predictor for multiple genomic signals and regions. Bioinformatics 2022;38:4053-4061. [PMID: 35799358 DOI: 10.1093/bioinformatics/btac454] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2022] [Revised: 04/11/2022] [Accepted: 07/06/2022] [Indexed: 12/24/2022] Open

Database of Potential Promoter Sequences in the Capsicum annuum Genome. BIOLOGY 2022;11:biology11081117. [PMID: 35892972 PMCID: PMC9332048 DOI: 10.3390/biology11081117] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/01/2022] [Revised: 07/19/2022] [Accepted: 07/23/2022] [Indexed: 11/16/2022]

Superstructure Detection in Nucleosome Distribution Shows Common Pattern within a Chromosome and within the Genome. Life (Basel) 2022;12:life12040541. [PMID: 35455033 PMCID: PMC9026121 DOI: 10.3390/life12040541] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2022] [Revised: 03/16/2022] [Accepted: 03/23/2022] [Indexed: 11/17/2022] Open

Lu Z, Berry K, Hu Z, Zhan Y, Ahn TH, Lin Z. TSSr: an R package for comprehensive analyses of TSS sequencing data. NAR Genom Bioinform 2021;3:lqab108. [PMID: 34805991 PMCID: PMC8598296 DOI: 10.1093/nargab/lqab108] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2021] [Revised: 10/05/2021] [Accepted: 10/27/2021] [Indexed: 12/13/2022] Open

Jürges CS, Dölken L, Erhard F. Integrative transcription start site identification with iTiSS. Bioinformatics 2021;37:3056-3057. [PMID: 33720332 DOI: 10.1093/bioinformatics/btab170] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2020] [Revised: 02/16/2021] [Accepted: 03/10/2021] [Indexed: 02/02/2023] Open

Policastro RA, Zentner GE. Global approaches for profiling transcription initiation. CELL REPORTS METHODS 2021;1:100081. [PMID: 34632443 PMCID: PMC8496859 DOI: 10.1016/j.crmeth.2021.100081] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]

Transcriptional Pausing and Activation at Exons-1 and -2, Respectively, Mediate the MGMT Gene Expression in Human Glioblastoma Cells. Genes (Basel) 2021;12:genes12060888. [PMID: 34201219 PMCID: PMC8228370 DOI: 10.3390/genes12060888] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2021] [Revised: 06/07/2021] [Accepted: 06/07/2021] [Indexed: 11/17/2022] Open

Abstract

Background: The therapeutically important DNA repair gene O⁶-methylguanine DNA methyltransferase (MGMT) is silenced by promoter methylation in human brain cancers. The co-players/regulators associated with this process and the subsequent progression of MGMT gene transcription beyond the non-coding exon 1 are unknown. As a follow-up to our recent finding of a predicted second promoter mapped proximal to the exon 2 [Int. J. Mol. Sci.2021, 22(5), 2492], we addressed its significance in MGMT transcription. Methods: RT-PCR, RT q-PCR, and nuclear run-on transcription assays were performed to compare and contrast the transcription rates of exon 1 and exon 2 of the MGMT gene in glioblastoma cells. Results: Bioinformatic characterization of the predicted MGMT exon 2 promoter showed several consensus TATA box and INR motifs and the absence of CpG islands in contrast to the established TATA-less, CpG-rich, and GAF-bindable exon 1 promoter. RT-PCR showed very weak MGMT-E1 expression in MGMT-proficient SF188 and T98G GBM cells, compared to active transcription of MGMT-E2. In the MGMT-deficient SNB-19 cells, the expression of both exons remained weak. The RT q-PCR revealed that MGMT-E2 and MGMT-E5 expression was about 80- to 175-fold higher than that of E1 in SF188 and T98G cells. Nuclear run-on transcription assays using bromo-uridine immunocapture followed by RT q-PCR confirmed the exceptionally lower and higher transcription rates for MGMT-E1 and MGMT-E2, respectively. Conclusions: The results provide the first evidence for transcriptional pausing at the promoter 1- and non-coding exon 1 junction of the human MGMT gene and its activation/elongation through the protein-coding exons 2 through 5, possibly mediated by a second promoter. The findings offer novel insight into the regulation of MGMT transcription in glioma and other cancer types.

Collapse

Perdikopanis N, Georgakilas GK, Grigoriadis D, Pierros V, Kavakiotis I, Alexiou P, Hatzigeorgiou A. DIANA-miRGen v4: indexing promoters and regulators for more than 1500 microRNAs. Nucleic Acids Res 2021;49:D151-D159. [PMID: 33245765 PMCID: PMC7778932 DOI: 10.1093/nar/gkaa1060] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Revised: 10/16/2020] [Accepted: 11/26/2020] [Indexed: 02/06/2023] Open