1
|
Wei H, Liang L, Song C, Tong M, Xu X. Regulatory role and molecular mechanism of METTL14 in vascular endothelial cell injury in preeclampsia. BIOMOLECULES & BIOMEDICINE 2025; 25:682-692. [PMID: 39319864 PMCID: PMC12010980 DOI: 10.17305/bb.2024.10963] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/09/2024] [Revised: 08/24/2024] [Accepted: 08/24/2024] [Indexed: 09/26/2024]
Abstract
Preeclampsia (PE) is a pregnancy-related disease characterized by vascular endothelial cell injury. This study aimed to investigate the role of methyltransferase-like protein 14 (METTL14) in vascular endothelial cell injury in PE. A PE cell model was established by treating human umbilical vein endothelial cells (HUVECs) with tumor necrosis factor-alpha (TNF-α) in vitro. METTL14 and forkhead box protein 1 (FOXP1) were silenced, and miR-34a-5p was overexpressed in HUVECs to evaluate their effects. HUVEC viability, apoptosis, and levels of intercellular adhesion molecule 1, vascular cell adhesion molecule 1, and endothelin-1 were measured. The N6-methyladenosine (m6A) modification of pri-miR-34a-5p was quantified. The interactions between miR-34a-5p, DiGeorge syndrome critical region 8, and m6A enrichment in miR-34a-5p were analyzed. The relationship between miR-34a-5p and FOXP1 was also verified. The results showed the expressions of METTL14, FOXP1, and miR-34a-5p. METTL14 expression was elevated in the TNF-α-induced HUVEC injury model. Silencing METTL14 improved HUVEC viability, inhibited apoptosis, and reduced endothelial inflammation. METTL14 promoted miR-34a-5p expression through m6A modification. Overexpression of miR-34a-5p or silencing FOXP1 reversed the protective effects of METTL14 silencing on cell injury in the PE model. In conclusion, METTL14 mediated m6A modification to promote miR-34a-5p expression, leading to FOXP1 inhibition, which aggravated endothelial cell damage in the PE cell model.
Collapse
Affiliation(s)
- Huafang Wei
- Department of Gynaecology and Obstetrics, General Hospital of the Central Theater Command of the Chinese People’s Liberation Army, Wuhan, China
| | - Lin Liang
- Department of Gynaecology and Obstetrics, General Hospital of the Central Theater Command of the Chinese People’s Liberation Army, Wuhan, China
| | - Chengwen Song
- Department of Gynaecology and Obstetrics, General Hospital of the Central Theater Command of the Chinese People’s Liberation Army, Wuhan, China
| | - Ming Tong
- Department of Gynaecology and Obstetrics, General Hospital of the Central Theater Command of the Chinese People’s Liberation Army, Wuhan, China
| | - Xiang Xu
- Supervision Office, Changsha Health Vocational College, Changsha, China
| |
Collapse
|
2
|
Athanasopoulou K, Chondrou V, Xiropotamos P, Psarias G, Vasilopoulos Y, Georgakilas GK, Sgourou A. Transcriptional repression of lncRNA and miRNA subsets mediated by LRF during erythropoiesis. J Mol Med (Berl) 2023; 101:1097-1112. [PMID: 37486375 PMCID: PMC10482784 DOI: 10.1007/s00109-023-02352-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Revised: 07/10/2023] [Accepted: 07/12/2023] [Indexed: 07/25/2023]
Abstract
Non-coding RNA (ncRNA) species, mainly long non-coding RNAs (lncRNAs) and microRNAs (miRNAs) have been currently imputed for lesser or greater involvement in human erythropoiesis. These RNA subsets operate within a complex circuit with other epigenetic components and transcription factors (TF) affecting chromatin remodeling during cell differentiation. Lymphoma/leukemia-related (LRF) TF exerts higher occupancy on DNA CpG rich sites and is implicated in several differentiation cell pathways and erythropoiesis among them and also directs the epigenetic regulation of hemoglobin transversion from fetal (HbF) to adult (HbA) form by intervening in the γ-globin gene repression. We intended to investigate LRF activity in the evolving landscape of cells' commitment to the erythroid lineage and specifically during HbF to HbA transversion, to qualify this TF as potential repressor of lncRNAs and miRNAs. Transgenic human erythroleukemia cells, overexpressing LRF and further induced to erythropoiesis, were subjected to expression analysis in high LRF occupancy genetic loci-producing lncRNAs. LRF abundance in genetic loci transcribing for studied lncRNAs was determined by ChIP-Seq data analysis. qPCRs were performed to examine lncRNA expression status. Differentially expressed miRNA pre- and post-erythropoiesis induction were assessed by next-generation sequencing (NGS), and their promoter regions were charted. Expression levels of lncRNAs were correlated with DNA methylation status of flanked CpG islands, and contingent co-regulation of hosted miRNAs was considered. LRF-binding sites were overrepresented in LRF overexpressing cell clones during erythropoiesis induction and exerted a significant suppressive effect towards lncRNAs and miRNA collections. Based on present data interpretation, LRF's multiplied binding capacity across genome is suggested to be transient and associated with higher levels of DNA methylation. KEY MESSAGES: During erythropoiesis, LRF displays extensive occupancy across genetic loci. LRF significantly represses subsets of lncRNAs and miRNAs during erythropoiesis. Promoter region CpG islands' methylation levels affect lncRNA expression. MiRNAs embedded within lncRNA loci show differential regulation of expression.
Collapse
Affiliation(s)
- Katerina Athanasopoulou
- Biology Laboratory, School of Science and Technology, Hellenic Open University, 26335 Patras, Greece
| | - Vasiliki Chondrou
- Biology Laboratory, School of Science and Technology, Hellenic Open University, 26335 Patras, Greece
| | - Panagiotis Xiropotamos
- Laboratory of Genetics, Section of Genetics, Cell Biology and Development, Department of Biology, University of Patras, 26504 Patras, Greece
| | - Georgios Psarias
- Biology Laboratory, School of Science and Technology, Hellenic Open University, 26335 Patras, Greece
| | - Yiannis Vasilopoulos
- Laboratory of Genetics, Section of Genetics, Cell Biology and Development, Department of Biology, University of Patras, 26504 Patras, Greece
| | - Georgios K. Georgakilas
- Laboratory of Genetics, Section of Genetics, Cell Biology and Development, Department of Biology, University of Patras, 26504 Patras, Greece
- Laboratory of Hygiene and Epidemiology, Faculty of Medicine, University of Thessaly, 41222 Larisa, Greece
| | - Argyro Sgourou
- Biology Laboratory, School of Science and Technology, Hellenic Open University, 26335 Patras, Greece
| |
Collapse
|
3
|
Zaytsev K, Fedorov A, Korotkov E. Classification of Promoter Sequences from Human Genome. Int J Mol Sci 2023; 24:12561. [PMID: 37628742 PMCID: PMC10454140 DOI: 10.3390/ijms241612561] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Revised: 07/28/2023] [Accepted: 08/03/2023] [Indexed: 08/27/2023] Open
Abstract
We have developed a new method for promoter sequence classification based on a genetic algorithm and the MAHDS sequence alignment method. We have created four classes of human promoters, combining 17,310 sequences out of the 29,598 present in the EPD database. We searched the human genome for potential promoter sequences (PPSs) using dynamic programming and position weight matrices representing each of the promoter sequence classes. A total of 3,065,317 potential promoter sequences were found. Only 1,241,206 of them were located in unannotated parts of the human genome. Every other PPS found intersected with either true promoters, transposable elements, or interspersed repeats. We found a strong intersection between PPSs and Alu elements as well as transcript start sites. The number of false positive PPSs is estimated to be 3 × 10-8 per nucleotide, which is several orders of magnitude lower than for any other promoter prediction method. The developed method can be used to search for PPSs in various eukaryotic genomes.
Collapse
Affiliation(s)
- Konstantin Zaytsev
- Bach Institute of Biochemistry, Federal Research Center of Biotechnology of the Russian Academy of Sciences, 119071 Moscow, Russia
| | - Alexey Fedorov
- Bach Institute of Biochemistry, Federal Research Center of Biotechnology of the Russian Academy of Sciences, 119071 Moscow, Russia
| | - Eugene Korotkov
- Institute of Bioengineering, Federal Research Center of Biotechnology of the Russian Academy of Sciences, 119071 Moscow, Russia
| |
Collapse
|
4
|
Barbero-Aparicio JA, Olivares-Gil A, Díez-Pastor JF, García-Osorio C. Deep learning and support vector machines for transcription start site identification. PeerJ Comput Sci 2023; 9:e1340. [PMID: 37346545 PMCID: PMC10280436 DOI: 10.7717/peerj-cs.1340] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2022] [Accepted: 03/21/2023] [Indexed: 06/23/2023]
Abstract
Recognizing transcription start sites is key to gene identification. Several approaches have been employed in related problems such as detecting translation initiation sites or promoters, many of the most recent ones based on machine learning. Deep learning methods have been proven to be exceptionally effective for this task, but their use in transcription start site identification has not yet been explored in depth. Also, the very few existing works do not compare their methods to support vector machines (SVMs), the most established technique in this area of study, nor provide the curated dataset used in the study. The reduced amount of published papers in this specific problem could be explained by this lack of datasets. Given that both support vector machines and deep neural networks have been applied in related problems with remarkable results, we compared their performance in transcription start site predictions, concluding that SVMs are computationally much slower, and deep learning methods, specially long short-term memory neural networks (LSTMs), are best suited to work with sequences than SVMs. For such a purpose, we used the reference human genome GRCh38. Additionally, we studied two different aspects related to data processing: the proper way to generate training examples and the imbalanced nature of the data. Furthermore, the generalization performance of the models studied was also tested using the mouse genome, where the LSTM neural network stood out from the rest of the algorithms. To sum up, this article provides an analysis of the best architecture choices in transcription start site identification, as well as a method to generate transcription start site datasets including negative instances on any species available in Ensembl. We found that deep learning methods are better suited than SVMs to solve this problem, being more efficient and better adapted to long sequences and large amounts of data. We also create a transcription start site (TSS) dataset large enough to be used in deep learning experiments.
Collapse
Affiliation(s)
| | - Alicia Olivares-Gil
- Departamento de Ingeniería Informática, Universidad de Burgos, Burgos, Spain
| | - José F. Díez-Pastor
- Departamento de Ingeniería Informática, Universidad de Burgos, Burgos, Spain
| | - César García-Osorio
- Departamento de Ingeniería Informática, Universidad de Burgos, Burgos, Spain
| |
Collapse
|
5
|
Patra P, B R D, Kundu P, Das M, Ghosh A. Recent advances in machine learning applications in metabolic engineering. Biotechnol Adv 2023; 62:108069. [PMID: 36442697 DOI: 10.1016/j.biotechadv.2022.108069] [Citation(s) in RCA: 26] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2022] [Revised: 10/18/2022] [Accepted: 11/22/2022] [Indexed: 11/27/2022]
Abstract
Metabolic engineering encompasses several widely-used strategies, which currently hold a high seat in the field of biotechnology when its potential is manifesting through a plethora of research and commercial products with a strong societal impact. The genomic revolution that occurred almost three decades ago has initiated the generation of large omics-datasets which has helped in gaining a better understanding of cellular behavior. The itinerary of metabolic engineering that has occurred based on these large datasets has allowed researchers to gain detailed insights and a reasonable understanding of the intricacies of biosystems. However, the existing trail-and-error approaches for metabolic engineering are laborious and time-intensive when it comes to the production of target compounds with high yields through genetic manipulations in host organisms. Machine learning (ML) coupled with the available metabolic engineering test instances and omics data brings a comprehensive and multidisciplinary approach that enables scientists to evaluate various parameters for effective strain design. This vast amount of biological data should be standardized through knowledge engineering to train different ML models for providing accurate predictions in gene circuits designing, modification of proteins, optimization of bioprocess parameters for scaling up, and screening of hyper-producing robust cell factories. This review briefs on the premise of ML, followed by mentioning various ML methods and algorithms alongside the numerous omics datasets available to train ML models for predicting metabolic outcomes with high-accuracy. The combinative interplay between the ML algorithms and biological datasets through knowledge engineering have guided the recent advancements in applications such as CRISPR/Cas systems, gene circuits, protein engineering, metabolic pathway reconstruction, and bioprocess engineering. Finally, this review addresses the probable challenges of applying ML in metabolic engineering which will guide the researchers toward novel techniques to overcome the limitations.
Collapse
Affiliation(s)
- Pradipta Patra
- School School of Energy Science and Engineering, Indian Institute of Technology Kharagpur, West Bengal 721302, India
| | - Disha B R
- B.M.S College of Engineering, Basavanagudi, Bengaluru, Karnataka 560019, India
| | - Pritam Kundu
- School School of Energy Science and Engineering, Indian Institute of Technology Kharagpur, West Bengal 721302, India
| | - Manali Das
- School of Bioscience, Indian Institute of Technology Kharagpur, West Bengal 721302, India
| | - Amit Ghosh
- School School of Energy Science and Engineering, Indian Institute of Technology Kharagpur, West Bengal 721302, India; P.K. Sinha Centre for Bioenergy and Renewables, Indian Institute of Technology Kharagpur, West Bengal 721302, India.
| |
Collapse
|
6
|
Barbero-Aparicio JA, Cuesta-Lopez S, García-Osorio CI, Pérez-Rodríguez J, García-Pedrajas N. Nonlinear physics opens a new paradigm for accurate transcription start site prediction. BMC Bioinformatics 2022; 23:565. [PMID: 36585618 PMCID: PMC9801560 DOI: 10.1186/s12859-022-05129-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2022] [Accepted: 12/27/2022] [Indexed: 12/31/2022] Open
Abstract
There is evidence that DNA breathing (spontaneous opening of the DNA strands) plays a relevant role in the interactions of DNA with other molecules, and in particular in the transcription process. Therefore, having physical models that can predict these openings is of interest. However, this source of information has not been used before either in transcription start sites (TSSs) or promoter prediction. In this article, one such model is used as an additional information source that, when used by a machine learning (ML) model, improves the results of current methods for the prediction of TSSs. In addition, we provide evidence on the validity of the physical model, as it is able by itself to predict TSSs with high accuracy. This opens an exciting avenue of research at the intersection of statistical mechanics and ML, where ML models in bioinformatics can be improved using physical models of DNA as feature extractors.
Collapse
Affiliation(s)
- José Antonio Barbero-Aparicio
- grid.23520.360000 0000 8569 1592Departamento de Informática, Universidad de Burgos, Avda. de Cantabria s/n, 09006 Burgos, Spain
| | - Santiago Cuesta-Lopez
- grid.23520.360000 0000 8569 1592Universidad de Burgos, Hospital del Rey, s/n, 09001 Burgos, Spain ,ICAMCyL Foundation, Internacional Center for Advanced Materials and Raw Materials of Castilla y León, León Technology Park, main building, first floor, offices 106-108, C/Julia Morros s/n, Armunia, 24009 León, Spain
| | - César Ignacio García-Osorio
- grid.23520.360000 0000 8569 1592Departamento de Informática, Universidad de Burgos, Avda. de Cantabria s/n, 09006 Burgos, Spain
| | - Javier Pérez-Rodríguez
- grid.449008.10000 0004 1795 4150Departamento de Métodos Cuantitativos, Universidad de Loyola Andalucía, Escritor Castilla Aguayo, 4, 14004 Córdoba, Spain
| | - Nicolás García-Pedrajas
- grid.411901.c0000 0001 2183 9102Department of Computing and Numerical Analysis, University of Córdoba, Edificio Albert Einstein, Campus de Rabanales, 14071 Córdoba, Spain
| |
Collapse
|
7
|
Grigoriadis D, Perdikopanis N, Georgakilas GK, Hatzigeorgiou AG. DeepTSS: multi-branch convolutional neural network for transcription start site identification from CAGE data. BMC Bioinformatics 2022; 23:395. [PMID: 36510136 PMCID: PMC9743497 DOI: 10.1186/s12859-022-04945-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Accepted: 09/16/2022] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND The widespread usage of Cap Analysis of Gene Expression (CAGE) has led to numerous breakthroughs in understanding the transcription mechanisms. Recent evidence in the literature, however, suggests that CAGE suffers from transcriptional and technical noise. Regardless of the sample quality, there is a significant number of CAGE peaks that are not associated with transcription initiation events. This type of signal is typically attributed to technical noise and more frequently to random five-prime capping or transcription bioproducts. Thus, the need for computational methods emerges, that can accurately increase the signal-to-noise ratio in CAGE data, resulting in error-free transcription start site (TSS) annotation and quantification of regulatory region usage. In this study, we present DeepTSS, a novel computational method for processing CAGE samples, that combines genomic signal processing (GSP), structural DNA features, evolutionary conservation evidence and raw DNA sequence with Deep Learning (DL) to provide single-nucleotide TSS predictions with unprecedented levels of performance. RESULTS To evaluate DeepTSS, we utilized experimental data, protein-coding gene annotations and computationally-derived genome segmentations by chromatin states. DeepTSS was found to outperform existing algorithms on all benchmarks, achieving 98% precision and 96% sensitivity (accuracy 95.4%) on the protein-coding gene strategy, with 96.66% of its positive predictions overlapping active chromatin, 98.27% and 92.04% co-localized with at least one transcription factor and H3K4me3 peak. CONCLUSIONS CAGE is a key protocol in deciphering the language of transcription, however, as every experimental protocol, it suffers from biological and technical noise that can severely affect downstream analyses. DeepTSS is a novel DL-based method for effectively removing noisy CAGE signal. In contrast to existing software, DeepTSS does not require feature selection since the embedded convolutional layers can readily identify patterns and only utilize the important ones for the classification task. This study highlights the key role that DL can play in Molecular Biology, by removing the inherent flaws of experimental protocols, that form the backbone of contemporary research. Here, we show how DeepTSS can unleash the full potential of an already popular and mature method such as CAGE, and push the boundaries of coding and non-coding gene expression regulator research even further.
Collapse
Affiliation(s)
- Dimitris Grigoriadis
- grid.418497.7Hellenic Pasteur Institute, 11521 Athens, Greece ,grid.410558.d0000 0001 0035 6670Department of Computer Science and Biomedical Informatics, University of Thessaly, 35131 Lamia, Greece
| | - Nikos Perdikopanis
- grid.418497.7Hellenic Pasteur Institute, 11521 Athens, Greece ,grid.5216.00000 0001 2155 0800Department of Informatics and Telecommunications, National and Kapodistrian University of Athens, 15784 Athens, Greece ,grid.410558.d0000 0001 0035 6670Department of Electrical and Computer Engineering, University of Thessaly, 38221 Volos, Greece
| | - Georgios K. Georgakilas
- grid.410558.d0000 0001 0035 6670Department of Electrical and Computer Engineering, University of Thessaly, 38221 Volos, Greece ,ommAI Technologies, Tallinn, Estonia
| | - Artemis G. Hatzigeorgiou
- grid.418497.7Hellenic Pasteur Institute, 11521 Athens, Greece ,grid.410558.d0000 0001 0035 6670Department of Computer Science and Biomedical Informatics, University of Thessaly, 35131 Lamia, Greece
| |
Collapse
|
8
|
Liu Q, Fang H, Wang X, Wang M, Li S, Coin LJM, Li F, Song J. DeepGenGrep: a general deep learning-based predictor for multiple genomic signals and regions. Bioinformatics 2022; 38:4053-4061. [PMID: 35799358 DOI: 10.1093/bioinformatics/btac454] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2022] [Revised: 04/11/2022] [Accepted: 07/06/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Accurate annotation of different genomic signals and regions (GSRs) from DNA sequences is fundamentally important for understanding gene structure, regulation and function. Numerous efforts have been made to develop machine learning-based predictors for in silico identification of GSRs. However, it remains a great challenge to identify GSRs as the performance of most existing approaches is unsatisfactory. As such, it is highly desirable to develop more accurate computational methods for GSRs prediction. RESULTS In this study, we propose a general deep learning framework termed DeepGenGrep, a general predictor for the systematic identification of multiple different GSRs from genomic DNA sequences. DeepGenGrep leverages the power of hybrid neural networks comprising a three-layer convolutional neural network and a two-layer long short-term memory to effectively learn useful feature representations from sequences. Benchmarking experiments demonstrate that DeepGenGrep outperforms several state-of-the-art approaches on identifying polyadenylation signals, translation initiation sites and splice sites across four eukaryotic species including Homo sapiens, Mus musculus, Bos taurus and Drosophila melanogaster. Overall, DeepGenGrep represents a useful tool for the high-throughput and cost-effective identification of potential GSRs in eukaryotic genomes. AVAILABILITY AND IMPLEMENTATION The webserver and source code are freely available at http://bigdata.biocie.cn/deepgengrep/home and Github (https://github.com/wx-cie/DeepGenGrep/). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Quanzhong Liu
- Department of Software Engineering, College of Information Engineering, Northwest A&F University, Yangling 712100, China
| | - Honglin Fang
- Department of Software Engineering, College of Information Engineering, Northwest A&F University, Yangling 712100, China
| | - Xiao Wang
- Department of Software Engineering, College of Information Engineering, Northwest A&F University, Yangling 712100, China
| | - Miao Wang
- Department of Software Engineering, College of Information Engineering, Northwest A&F University, Yangling 712100, China
| | - Shuqin Li
- Department of Software Engineering, College of Information Engineering, Northwest A&F University, Yangling 712100, China
| | - Lachlan J M Coin
- Department of Microbiology and Immunology, The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, Melbourne, VIC 3000, Australia
| | - Fuyi Li
- Department of Software Engineering, College of Information Engineering, Northwest A&F University, Yangling 712100, China.,Department of Microbiology and Immunology, The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, Melbourne, VIC 3000, Australia
| | - Jiangning Song
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia.,Monash Data Futures Institute, Monash University, Melbourne, VIC 3800, Australia
| |
Collapse
|
9
|
Database of Potential Promoter Sequences in the Capsicum annuum Genome. BIOLOGY 2022; 11:biology11081117. [PMID: 35892972 PMCID: PMC9332048 DOI: 10.3390/biology11081117] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/01/2022] [Revised: 07/19/2022] [Accepted: 07/23/2022] [Indexed: 11/16/2022]
Abstract
In this study, we used a mathematical method for the multiple alignment of highly divergent sequences (MAHDS) to create a database of potential promoter sequences (PPSs) in the Capsicum annuum genome. To search for PPSs, 20 statistically significant classes of sequences located in the range from −499 to +100 nucleotides near the annotated genes were calculated. For each class, a position–weight matrix (PWM) was computed and then used to identify PPSs in the C. annuum genome. In total, 825,136 PPSs were detected, with a false positive rate of 0.13%. The PPSs obtained with the MAHDS method were tested using TSSFinder, which detects transcription start sites. The databank of the found PPSs provides their coordinates in chromosomes, the alignment of each PPS with the PWM, and the level of statistical significance as a normal distribution argument, and can be used in genetic engineering and biotechnology.
Collapse
|
10
|
Superstructure Detection in Nucleosome Distribution Shows Common Pattern within a Chromosome and within the Genome. Life (Basel) 2022; 12:life12040541. [PMID: 35455033 PMCID: PMC9026121 DOI: 10.3390/life12040541] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2022] [Revised: 03/16/2022] [Accepted: 03/23/2022] [Indexed: 11/17/2022] Open
Abstract
Nucleosome positioning plays an important role in crucial biological processes such as replication, transcription, and gene regulation. It has been widely used to predict the genome’s function and chromatin organisation. So far, the studies of patterns in nucleosome positioning have been limited to transcription start sites, CTCFs binding sites, and some promoter and loci regions. The genome-wide organisational pattern remains unknown. We have developed a theoretical model to coarse-grain nucleosome positioning data in order to obtain patterns in their distribution. Using hierarchical clustering on the auto-correlation function of this coarse-grained nucleosome positioning data, a genome-wide clustering is obtained for Candida albicans. The clustering shows the existence beyond hetero- and eu-chromatin inside the chromosomes. These non-trivial clusterings correspond to different nucleosome distributions and gene densities governing differential gene expression patterns. Moreover, these distribution patterns inside the chromosome appeared to be conserved throughout the genome and within species. The pipeline of the coarse grain nucleosome positioning sequence to identify underlying genomic organisation used in our study is novel, and the classifications obtained are unique and consistent.
Collapse
|
11
|
Lu Z, Berry K, Hu Z, Zhan Y, Ahn TH, Lin Z. TSSr: an R package for comprehensive analyses of TSS sequencing data. NAR Genom Bioinform 2021; 3:lqab108. [PMID: 34805991 PMCID: PMC8598296 DOI: 10.1093/nargab/lqab108] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2021] [Revised: 10/05/2021] [Accepted: 10/27/2021] [Indexed: 12/13/2022] Open
Abstract
Transcription initiation is regulated in a highly organized fashion to ensure proper cellular functions. Accurate identification of transcription start sites (TSSs) and quantitative characterization of transcription initiation activities are fundamental steps for studies of regulated transcriptions and core promoter structures. Several high-throughput techniques have been developed to sequence the very 5'end of RNA transcripts (TSS sequencing) on the genome scale. Bioinformatics tools are essential for processing, analysis, and visualization of TSS sequencing data. Here, we present TSSr, an R package that provides rich functions for mapping TSS and characterizations of structures and activities of core promoters based on all types of TSS sequencing data. Specifically, TSSr implements several newly developed algorithms for accurately identifying TSSs from mapped sequencing reads and inference of core promoters, which are a prerequisite for subsequent functional analyses of TSS data. Furthermore, TSSr also enables users to export various types of TSS data that can be visualized by genome browser for inspection of promoter activities in association with other genomic features, and to generate publication-ready TSS graphs. These user-friendly features could greatly facilitate studies of transcription initiation based on TSS sequencing data. The source code and detailed documentations of TSSr can be freely accessed at https://github.com/Linlab-slu/TSSr.
Collapse
Affiliation(s)
- Zhaolian Lu
- Department of Biology, Saint Louis University, St. Louis, MO 63103, USA
| | - Keenan Berry
- Program of Bioinformatics and Computational Biology, Saint Louis University, St. Louis, MO 63103, USA
| | - Zhenbin Hu
- Department of Biology, Saint Louis University, St. Louis, MO 63103, USA
| | - Yu Zhan
- Department of Biology, Saint Louis University, St. Louis, MO 63103, USA
| | - Tae-Hyuk Ahn
- Program of Bioinformatics and Computational Biology, Saint Louis University, St. Louis, MO 63103, USA
| | - Zhenguo Lin
- Department of Biology, Saint Louis University, St. Louis, MO 63103, USA
| |
Collapse
|
12
|
Jürges CS, Dölken L, Erhard F. Integrative transcription start site identification with iTiSS. Bioinformatics 2021; 37:3056-3057. [PMID: 33720332 DOI: 10.1093/bioinformatics/btab170] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2020] [Revised: 02/16/2021] [Accepted: 03/10/2021] [Indexed: 02/02/2023] Open
Abstract
SUMMARY Many experimental approaches have been developed to identify transcription start sites (TSS) from genomic scale data. However, experiment specific biases lead to large numbers of false-positive calls. Here, we present our integrative approach iTiSS, which is an accurate and generic TSS caller for any TSS profiling experiment in eukaryotes, and substantially reduces the number of false positives by a joint analysis of several complementary datasets. AVAILABILITY AND IMPLEMENTATION iTiSS is platform independent and implemented in Java (v1.8) and is freely available at https://www.erhard-lab.de/software and https://github.com/erhard-lab/iTiSS. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Christopher S Jürges
- Institute for Virology and Immunobiology, Julius-Maximilians-University Würzburg, Würzburg 97078, Germany
| | - Lars Dölken
- Institute for Virology and Immunobiology, Julius-Maximilians-University Würzburg, Würzburg 97078, Germany
| | - Florian Erhard
- Institute for Virology and Immunobiology, Julius-Maximilians-University Würzburg, Würzburg 97078, Germany
| |
Collapse
|
13
|
Abstract
Transcription start site (TSS) selection influences transcript stability and translation as well as protein sequence. Alternative TSS usage is pervasive in organismal development, is a major contributor to transcript isoform diversity in humans, and is frequently observed in human diseases including cancer. In this review, we discuss the breadth of techniques that have been used to globally profile TSSs and the resulting insights into gene regulation, as well as future prospects in this area of inquiry.
Collapse
Affiliation(s)
| | - Gabriel E. Zentner
- Department of Biology, Indiana University, Bloomington, IN 47401, USA
- Indiana University Melvin and Bren Simon Comprehensive Cancer Center, Indianapolis, IN 46202, USA
| |
Collapse
|
14
|
Transcriptional Pausing and Activation at Exons-1 and -2, Respectively, Mediate the MGMT Gene Expression in Human Glioblastoma Cells. Genes (Basel) 2021; 12:genes12060888. [PMID: 34201219 PMCID: PMC8228370 DOI: 10.3390/genes12060888] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2021] [Revised: 06/07/2021] [Accepted: 06/07/2021] [Indexed: 11/17/2022] Open
Abstract
Background: The therapeutically important DNA repair gene O6-methylguanine DNA methyltransferase (MGMT) is silenced by promoter methylation in human brain cancers. The co-players/regulators associated with this process and the subsequent progression of MGMT gene transcription beyond the non-coding exon 1 are unknown. As a follow-up to our recent finding of a predicted second promoter mapped proximal to the exon 2 [Int. J. Mol. Sci.2021, 22(5), 2492], we addressed its significance in MGMT transcription. Methods: RT-PCR, RT q-PCR, and nuclear run-on transcription assays were performed to compare and contrast the transcription rates of exon 1 and exon 2 of the MGMT gene in glioblastoma cells. Results: Bioinformatic characterization of the predicted MGMT exon 2 promoter showed several consensus TATA box and INR motifs and the absence of CpG islands in contrast to the established TATA-less, CpG-rich, and GAF-bindable exon 1 promoter. RT-PCR showed very weak MGMT-E1 expression in MGMT-proficient SF188 and T98G GBM cells, compared to active transcription of MGMT-E2. In the MGMT-deficient SNB-19 cells, the expression of both exons remained weak. The RT q-PCR revealed that MGMT-E2 and MGMT-E5 expression was about 80- to 175-fold higher than that of E1 in SF188 and T98G cells. Nuclear run-on transcription assays using bromo-uridine immunocapture followed by RT q-PCR confirmed the exceptionally lower and higher transcription rates for MGMT-E1 and MGMT-E2, respectively. Conclusions: The results provide the first evidence for transcriptional pausing at the promoter 1- and non-coding exon 1 junction of the human MGMT gene and its activation/elongation through the protein-coding exons 2 through 5, possibly mediated by a second promoter. The findings offer novel insight into the regulation of MGMT transcription in glioma and other cancer types.
Collapse
|
15
|
Perdikopanis N, Georgakilas GK, Grigoriadis D, Pierros V, Kavakiotis I, Alexiou P, Hatzigeorgiou A. DIANA-miRGen v4: indexing promoters and regulators for more than 1500 microRNAs. Nucleic Acids Res 2021; 49:D151-D159. [PMID: 33245765 PMCID: PMC7778932 DOI: 10.1093/nar/gkaa1060] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Revised: 10/16/2020] [Accepted: 11/26/2020] [Indexed: 02/06/2023] Open
Abstract
Deregulation of microRNA (miRNA) expression plays a critical role in the transition from a physiological to a pathological state. The accurate miRNA promoter identification in multiple cell types is a fundamental endeavor towards understanding and characterizing the underlying mechanisms of both physiological as well as pathological conditions. DIANA-miRGen v4 (www.microrna.gr/mirgenv4) provides cell type specific miRNA transcription start sites (TSSs) for over 1500 miRNAs retrieved from the analysis of >1000 cap analysis of gene expression (CAGE) samples corresponding to 133 tissues, cell lines and primary cells available in FANTOM repository. MiRNA TSS locations were associated with transcription factor binding site (TFBSs) annotation, for >280 TFs, derived from analyzing the majority of ENCODE ChIP-Seq datasets. For the first time, clusters of cell types having common miRNA TSSs are characterized and provided through a user friendly interface with multiple layers of customization. DIANA-miRGen v4 significantly improves our understanding of miRNA biogenesis regulation at the transcriptional level by providing a unique integration of high-quality annotations for hundreds of cell specific miRNA promoters with experimentally derived TFBSs.
Collapse
Affiliation(s)
- Nikos Perdikopanis
- Hellenic Pasteur Institute, Athens 11521, Greece.,Department of Electrical and Computer Engineering, University of Thessaly, Volos 38221, Greece.,Department of Informatics and Telecommunications, National and Kapodistrian University of Athens, Athens 15784, Greece
| | - Georgios K Georgakilas
- Central European Institute of Technology, Masaryk University, Kamenice 735/5, 62500 Brno, Czech Republic
| | - Dimitris Grigoriadis
- Hellenic Pasteur Institute, Athens 11521, Greece.,Department of Computer Science and Biomedical Informatics, University of Thessaly, Greece
| | - Vasilis Pierros
- Hellenic Pasteur Institute, Athens 11521, Greece.,Department of Electrical and Computer Engineering, University of Thessaly, Volos 38221, Greece
| | - Ioannis Kavakiotis
- Department of Computer Science and Biomedical Informatics, University of Thessaly, Greece
| | - Panagiotis Alexiou
- Central European Institute of Technology, Masaryk University, Kamenice 735/5, 62500 Brno, Czech Republic
| | - Artemis Hatzigeorgiou
- Hellenic Pasteur Institute, Athens 11521, Greece.,Department of Electrical and Computer Engineering, University of Thessaly, Volos 38221, Greece.,Department of Computer Science and Biomedical Informatics, University of Thessaly, Greece
| |
Collapse
|