1
|
Luo R, Liu J, Guan L, Li M. HybProm: An attention-assisted hybrid CNN-BiLSTM model for the interpretable prediction of DNA promoter. Methods 2025; 235:71-80. [PMID: 39929298 DOI: 10.1016/j.ymeth.2025.02.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2024] [Revised: 01/18/2025] [Accepted: 02/03/2025] [Indexed: 02/13/2025] Open
Abstract
Promoter prediction is essential for analyzing gene structures, understanding regulatory networks, transcription mechanisms, and precisely controlling gene expression. Recently, computational and deep learning methods for promoter prediction have gained attention. However, there is still room to improve their accuracy. To address this, we propose the HybProm model, which uses DNA2Vec to transform DNA sequences into low-dimensional vectors, followed by a CNN-BiLSTM-Attention architecture to extract features and predict promoters across species, including E. coli, humans, mice, and plants. Experiments show that HybProm consistently achieves high accuracy (90%-99%) and offers good interpretability by identifying key sequence patterns and positions that drive predictions.
Collapse
Affiliation(s)
- Rentao Luo
- College of Physics and Electronic Information, Gannan Normal University, Ganzhou 341000 Jiangxi, China
| | - Jiawei Liu
- College of Physics and Electronic Information, Gannan Normal University, Ganzhou 341000 Jiangxi, China
| | - Lixin Guan
- College of Physics and Electronic Information, Gannan Normal University, Ganzhou 341000 Jiangxi, China
| | - Mengshan Li
- College of Physics and Electronic Information, Gannan Normal University, Ganzhou 341000 Jiangxi, China.
| |
Collapse
|
2
|
Romerio F. Origin and functional role of antisense transcription in endogenous and exogenous retroviruses. Retrovirology 2023; 20:6. [PMID: 37194028 DOI: 10.1186/s12977-023-00622-x] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Accepted: 04/30/2023] [Indexed: 05/18/2023] Open
Abstract
Most proteins expressed by endogenous and exogenous retroviruses are encoded in the sense (positive) strand of the genome and are under the control of regulatory elements within the 5' long terminal repeat (LTR). A number of retroviral genomes also encode genes in the antisense (negative) strand and their expression is under the control of negative sense promoters within the 3' LTR. In the case of the Human T-cell Lymphotropic Virus 1 (HTLV-1), the antisense protein HBZ has been shown to play a critical role in the virus lifecycle and in the pathogenic process, while the function of the Human Immunodeficiency Virus 1 (HIV-1) antisense protein ASP remains unknown. However, the expression of 3' LTR-driven antisense transcripts is not always demonstrably associated with the presence of an antisense open reading frame encoding a viral protein. Moreover, even in the case of retroviruses that do express an antisense protein, such as HTLV-1 and the pandemic strains of HIV-1, the 3' LTR-driven antisense transcript shows both protein-coding and noncoding activities. Indeed, the ability to express antisense transcripts appears to be phylogenetically more widespread among endogenous and exogenous retroviruses than the presence of a functional antisense open reading frame within these transcripts. This suggests that retroviral antisense transcripts may have originated as noncoding molecules with regulatory activity that in some cases later acquired protein-coding function. Here, we will review examples of endogenous and exogenous retroviral antisense transcripts, and the ways through which they benefit viral persistence in the host.
Collapse
Affiliation(s)
- Fabio Romerio
- Department of Molecular and Comparative Pathobiology, Johns Hopkins University School of Medicine, Baltimore, MD, USA.
| |
Collapse
|
3
|
Rojas DA, Urbina F, Valenzuela-Pérez L, Leiva L, Miralles VJ, Maldonado E. Initiator-Directed Transcription: Fission Yeast Nmtl Initiator Directs Preinitiation Complex Formation and Transcriptional Initiation. Genes (Basel) 2022; 13:genes13020256. [PMID: 35205301 PMCID: PMC8871863 DOI: 10.3390/genes13020256] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2021] [Revised: 01/22/2022] [Accepted: 01/25/2022] [Indexed: 02/01/2023] Open
Abstract
The initiator element is a core promoter element encompassing the transcription start site, which is found in yeast, Drosophila, and human promoters. This element is observed in TATA-less promoters. Several studies have defined transcription factor requirements and additional cofactors that are needed for transcription initiation of initiator-containing promoters. However, those studies have been performed with additional core promoters in addition to the initiator. In this work, we have defined the pathway of preinitiation complex formation on the fission yeast nmt1 gene promoter, which contains a functional initiator with striking similarity to the initiator of the human dihydrofolate reductase (hDHFR) gene and to the factor requirement for transcription initiation of the nmt1 gene promoter. The results show that the nmt1 gene promoter possesses an initiator encompassing the transcription start site, and several conserved base positions are required for initiator function. A preinitiation complex formation on the nmt1 initiator can be started by TBP/TFIIA or TBP/TFIIB, but not TBP alone, and afterwards follows the same pathway as preinitiation complex formation on TATA-containing promoters. Transcription initiation is dependent on the general transcription factors TBP, TFIIB, TFIIE, TFIIF, TFIIH, RNA polymerase II, Mediator, and a cofactor identified as transcription cofactor for initiator function (TCIF), which is a high-molecular-weight protein complex of around 500 kDa. However, the TAF subunits of TFIID were not required for the nmt1 initiator transcription, as far as we tested. We also demonstrate that other initiators of the nmt1/hDHFR family can be transcribed in fission yeast whole-cell extracts.
Collapse
Affiliation(s)
- Diego A. Rojas
- Instituto de Ciencias Biomédicas, Facultad de Ciencias de la Salud, Universidad Autónoma de Chile, Santiago 8910132, Chile
- Correspondence: address: (D.A.R.); (E.M.)
| | - Fabiola Urbina
- Programa de Biología Celular y Molecular, ICBM, Facultad de Medicina, Universidad de Chile, Santiago 8380492, Chile; (F.U.); (L.V.-P.); (L.L.)
| | - Lucía Valenzuela-Pérez
- Programa de Biología Celular y Molecular, ICBM, Facultad de Medicina, Universidad de Chile, Santiago 8380492, Chile; (F.U.); (L.V.-P.); (L.L.)
| | - Lorenzo Leiva
- Programa de Biología Celular y Molecular, ICBM, Facultad de Medicina, Universidad de Chile, Santiago 8380492, Chile; (F.U.); (L.V.-P.); (L.L.)
| | - Vicente J. Miralles
- Departamento Bioquímica y Biología Molecular, Facultad de Farmacia, Universidad de Valencia, 46010 Valencia, Spain;
| | - Edio Maldonado
- Programa de Biología Celular y Molecular, ICBM, Facultad de Medicina, Universidad de Chile, Santiago 8380492, Chile; (F.U.); (L.V.-P.); (L.L.)
- Correspondence: address: (D.A.R.); (E.M.)
| |
Collapse
|
4
|
Li R, Sklutuis R, Groebner JL, Romerio F. HIV-1 Natural Antisense Transcription and Its Role in Viral Persistence. Viruses 2021; 13:v13050795. [PMID: 33946840 PMCID: PMC8145503 DOI: 10.3390/v13050795] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2021] [Revised: 04/26/2021] [Accepted: 04/27/2021] [Indexed: 12/11/2022] Open
Abstract
Natural antisense transcripts (NATs) represent a class of RNA molecules that are transcribed from the opposite strand of a protein-coding gene, and that have the ability to regulate the expression of their cognate protein-coding gene via multiple mechanisms. NATs have been described in many prokaryotic and eukaryotic systems, as well as in the viruses that infect them. The human immunodeficiency virus (HIV-1) is no exception, and produces one or more NAT from a promoter within the 3’ long terminal repeat. HIV-1 antisense transcripts have been the focus of several studies spanning over 30 years. However, a complete appreciation of the role that these transcripts play in the virus lifecycle is still lacking. In this review, we cover the current knowledge about HIV-1 NATs, discuss some of the questions that are still open and identify possible areas of future research.
Collapse
Affiliation(s)
- Rui Li
- Department of Molecular and Comparative Pathobiology, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA;
| | - Rachel Sklutuis
- HIV Dynamics and Replication Program, Host-Virus Interaction Branch, National Cancer Institute, National Institutes of Health, Frederick, MD 21702, USA; (R.S.); (J.L.G.)
| | - Jennifer L. Groebner
- HIV Dynamics and Replication Program, Host-Virus Interaction Branch, National Cancer Institute, National Institutes of Health, Frederick, MD 21702, USA; (R.S.); (J.L.G.)
| | - Fabio Romerio
- Department of Molecular and Comparative Pathobiology, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA;
- Correspondence:
| |
Collapse
|
5
|
Takei N, Takada Y, Kawamura S, Sato K, Saitoh A, Bormann J, Yuen WS, Carroll J, Kotani T. Changes in subcellular structures and states of pumilio 1 regulate the translation of target Mad2 and cyclin B1 mRNAs. J Cell Sci 2020; 133:jcs249128. [PMID: 33148609 DOI: 10.1242/jcs.249128] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2020] [Accepted: 10/22/2020] [Indexed: 12/12/2022] Open
Abstract
Temporal and spatial control of mRNA translation has emerged as a major mechanism for promoting diverse biological processes. However, the molecular nature of temporal and spatial control of translation remains unclear. In oocytes, many mRNAs are deposited as a translationally repressed form and are translated at appropriate times to promote the progression of meiosis and development. Here, we show that changes in subcellular structures and states of the RNA-binding protein pumilio 1 (Pum1) regulate the translation of target mRNAs and progression of oocyte maturation. Pum1 was shown to bind to Mad2 (also known as Mad2l1) and cyclin B1 mRNAs, assemble highly clustered aggregates, and surround Mad2 and cyclin B1 RNA granules in mouse oocytes. These Pum1 aggregates were dissolved prior to the translational activation of target mRNAs, possibly through phosphorylation. Stabilization of Pum1 aggregates prevented the translational activation of target mRNAs and progression of oocyte maturation. Together, our results provide an aggregation-dissolution model for the temporal and spatial control of translation.
Collapse
Affiliation(s)
- Natsumi Takei
- Biosystems Science Course, Graduate School of Life Science, Hokkaido University, Sapporo 060-0810, Japan
| | - Yuki Takada
- Biosystems Science Course, Graduate School of Life Science, Hokkaido University, Sapporo 060-0810, Japan
| | - Shohei Kawamura
- Biosystems Science Course, Graduate School of Life Science, Hokkaido University, Sapporo 060-0810, Japan
| | - Keisuke Sato
- Biosystems Science Course, Graduate School of Life Science, Hokkaido University, Sapporo 060-0810, Japan
| | - Atsushi Saitoh
- Biosystems Science Course, Graduate School of Life Science, Hokkaido University, Sapporo 060-0810, Japan
| | - Jenny Bormann
- Development and Stem Cells Program and Department of Anatomy and Developmental Biology, Monash Biomedicine Discovery Institute, Monash University, Melbourne, Victoria 3800, Australia
| | - Wai Shan Yuen
- Development and Stem Cells Program and Department of Anatomy and Developmental Biology, Monash Biomedicine Discovery Institute, Monash University, Melbourne, Victoria 3800, Australia
| | - John Carroll
- Development and Stem Cells Program and Department of Anatomy and Developmental Biology, Monash Biomedicine Discovery Institute, Monash University, Melbourne, Victoria 3800, Australia
| | - Tomoya Kotani
- Biosystems Science Course, Graduate School of Life Science, Hokkaido University, Sapporo 060-0810, Japan
- Department of Biological Sciences, Faculty of Science, Hokkaido University, Sapporo 060-0810, Japan
| |
Collapse
|
6
|
Luse DS, Parida M, Spector BM, Nilson KA, Price DH. A unified view of the sequence and functional organization of the human RNA polymerase II promoter. Nucleic Acids Res 2020; 48:7767-7785. [PMID: 32597978 PMCID: PMC7641323 DOI: 10.1093/nar/gkaa531] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2020] [Revised: 05/31/2020] [Accepted: 06/24/2020] [Indexed: 12/20/2022] Open
Abstract
To better understand human RNA polymerase II (Pol II) promoters in the context of promoter-proximal pausing and local chromatin organization, 5′ and 3′ ends of nascent capped transcripts and the locations of nearby nucleosomes were accurately identified through sequencing at exceptional depth. High-quality visualization tools revealed a preferred sequence that defines over 177 000 core promoters with strengths varying by >10 000-fold. This sequence signature encompasses and better defines the binding site for TFIID and is surprisingly invariant over a wide range of promoter strength. We identified a sequence motif associated with promoter-proximal pausing and demonstrated that cap methylation only begins once transcripts are about 30 nt long. Mapping also revealed a ∼150 bp periodic downstream sequence element (PDE) following the typical pause location, strongly suggestive of a +1 nucleosome positioning element. A nuclear run-off assay utilizing the unique properties of the DNA fragmentation factor (DFF) coupled with sequencing of DFF protected fragments demonstrated that a +1 nucleosome is present downstream of paused Pol II. Our data more clearly define the human Pol II promoter: a TFIID binding site with built-in downstream information directing ubiquitous promoter-proximal pausing and downstream nucleosome location.
Collapse
Affiliation(s)
- Donal S Luse
- Department of Cardiovascular and Metabolic Sciences, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA
| | - Mrutyunjaya Parida
- Department of Biochemistry, The University of Iowa, Iowa City, IA 52242, USA
| | - Benjamin M Spector
- Department of Biochemistry, The University of Iowa, Iowa City, IA 52242, USA
| | - Kyle A Nilson
- Department of Biochemistry, The University of Iowa, Iowa City, IA 52242, USA
| | - David H Price
- Department of Biochemistry, The University of Iowa, Iowa City, IA 52242, USA
| |
Collapse
|
7
|
Khodabandelou G, Routhier E, Mozziconacci J. Genome annotation across species using deep convolutional neural networks. PeerJ Comput Sci 2020; 6:e278. [PMID: 33816929 PMCID: PMC7924482 DOI: 10.7717/peerj-cs.278] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2019] [Accepted: 05/18/2020] [Indexed: 06/12/2023]
Abstract
Application of deep neural network is a rapidly expanding field now reaching many disciplines including genomics. In particular, convolutional neural networks have been exploited for identifying the functional role of short genomic sequences. These approaches rely on gathering large sets of sequences with known functional role, extracting those sequences from whole-genome-annotations. These sets are then split into learning, test and validation sets in order to train the networks. While the obtained networks perform well on validation sets, they often perform poorly when applied on whole genomes in which the ratio of positive over negative examples can be very different than in the training set. We here address this issue by assessing the genome-wide performance of networks trained with sets exhibiting different ratios of positive to negative examples. As a case study, we use sequences encompassing gene starts from the RefGene database as positive examples and random genomic sequences as negative examples. We then demonstrate that models trained using data from one organism can be used to predict gene-start sites in a related species, when using training sets providing good genome-wide performance. This cross-species application of convolutional neural networks provides a new way to annotate any genome from existing high-quality annotations in a related reference species. It also provides a way to determine whether the sequence motifs recognised by chromatin-associated proteins in different species are conserved or not.
Collapse
Affiliation(s)
- Ghazaleh Khodabandelou
- Laboratoire de Physique Théorique de la Matière Condensée (LPTMC), Sorbonne Université, Paris, France
- Laboratoire Images, Signaux et Systèmes Intelligents (LISSI), Université Val-de-Marne (Paris XII), Paris, France
| | - Etienne Routhier
- Laboratoire de Physique Théorique de la Matière Condensée (LPTMC), Sorbonne Université, Paris, France
| | - Julien Mozziconacci
- Laboratoire de Physique Théorique de la Matière Condensée (LPTMC), Sorbonne Université, Paris, France
- CNRS UMR 7196 / INSERM U1154 - Sorbonne Université, Museum national d’Histoire naturelle (MNHN), Paris, France
- Institut Universitaire de France, Paris, France
| |
Collapse
|
8
|
Core Element Cloning, Cis -Element Mapping and Serum Regulation of the Human EphB4 Promoter: A Novel TATA-Less Inr/MTE/DPE -Like Regulated Gene. Genes (Basel) 2019; 10:genes10120997. [PMID: 31810288 PMCID: PMC6947382 DOI: 10.3390/genes10120997] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2019] [Revised: 11/12/2019] [Accepted: 11/27/2019] [Indexed: 12/12/2022] Open
Abstract
The EphB4 gene encodes for a transmembrane tyrosine kinase receptor involved in embryonic blood vessel differentiation and cancer development. Although EphB4 is known to be regulated at the post-translational level, little is known about its gene regulation. The present study describes the core promoter elements’ identification and cloning, the cis-regulatory elements’ mapping and the serum regulation of the human EphB4 gene promoter region. Using bioinformatic analysis, Sanger sequencing and recombinant DNA technology, we analyzed the EphB4 gene upstream region spanning +40/−1509 from the actual transcription start site (TSS) and proved it to be a TATA-less gene promoter with dispersed regulatory elements characterized by a novel motif-of-ten element (MTE) at positions +18/+28, and a DPE-like motif and a DPE-like-repeated motif (DRM) spanning nt +27/+30 and +32 +35, respectively. We also mapped both proximal (multiple Sp1) and distal (HoxA9) trans-activating/dispersed cis-acting transcription factor (TF)-binding elements on the region we studied and used a transient transfection reporter assay to characterize its regulation by serum and IGF-II using EphB4 promoter deletion constructs with or without the identified new DNA-binding elements. Altogether, these findings shed new light on the human EphB4 promoter structure and regulation, suggesting mechanistic features conserved among Pol-II TATA-less genes phylogenetically shared from Drosophila to Human genomes.
Collapse
|
9
|
Zhou J, Gou H, Zhang L, Wang X, Ye Y, Lu X, Ying B. ARID5B Genetic Polymorphisms Contribute to the Susceptibility and Prognosis of Male Acute Promyelocytic Leukemia. DNA Cell Biol 2019; 38:1374-1386. [PMID: 31599655 DOI: 10.1089/dna.2019.4926] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
Abstract
This study was conducted using TagSNPs to systematically explore the relationship between ARID5B polymorphisms and the occurrence, clinical characterization, and prognosis of acute myeloid leukemia (AML). A total of 569 unrelated AML patients and 410 healthy individuals from West China were recruited, and ARID5B TagSNPs were genotyped using iMLDR® (improved multiplex ligation detection reaction). It was found that the association of ARID5B polymorphisms with AML was most significant in acute promyelocytic leukemia (APL), and exclusively in males, the mutant alleles of rs6415872, rs2393726, rs7073837, rs10821936, and rs7089424 were found to increase the risk of developing APL in men, the odds ratio (OR) were 1.36, 1.74, 1.45, 1.53, and 1.56 (all p < 0.05), respectively. Haplotype analysis revealed that haplotype [AACCG] increased the risk of male APL with an OR of 1.53 (95% confidence interval: 1.10-2.14, p = 0.012). Besides, there was a strong positive additive interaction between rs6415872 and rs10821936, rs7089424, respectively, and cases attributed to the interaction of rs6415872, rs10821936, and rs7089424 accounted for 100%. Furthermore, ARID5B single nucleotide polymorphisms were found associated with clinical features of AML, and rs6415872 was shown to be an independent prognosis factor in APL patients. Besides, dual luciferase report assay showed that rs6415872 may affect the binding activity of PPARG with ARID5B. ARID5B polymorphisms contribute to male APL risk, clinical feature, and prognosis, suggesting the importance of ARDI5B in AML pathogenesis and development, and the gender and subtype preference may prompt some specific mechanisms of ARID5B. Besides, ARID5B polymorphisms might be a potential prognosis biomarker.
Collapse
Affiliation(s)
- Juan Zhou
- Department of Laboratory Medicine, West China Hospital, Sichuan University, Chengdu, P.R. China
| | - Haimei Gou
- Department of Clinical Laboratory, Affiliated Hospital of North Sichuan Medical College, Nanchong, P.R. China
| | - Li Zhang
- Department of Laboratory Medicine, West China Hospital, Sichuan University, Chengdu, P.R. China
| | - Xinyi Wang
- Department of Laboratory Medicine, West China Hospital, Sichuan University, Chengdu, P.R. China
| | - Yuanxin Ye
- Department of Laboratory Medicine, West China Hospital, Sichuan University, Chengdu, P.R. China
| | - Xiaojun Lu
- Department of Laboratory Medicine, West China Hospital, Sichuan University, Chengdu, P.R. China
| | - Binwu Ying
- Department of Laboratory Medicine, West China Hospital, Sichuan University, Chengdu, P.R. China
| |
Collapse
|
10
|
Zeng R, Liang Y, Farooq MU, Zhang Y, Ei HH, Tang Z, Zheng T, Su Y, Ye X, Jia X, Zhu J. Alterations in transcriptome and antioxidant activity of naturally aged mice exposed to selenium-rich rice. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2019; 26:17834-17844. [PMID: 31037530 DOI: 10.1007/s11356-019-05226-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/19/2019] [Accepted: 04/17/2019] [Indexed: 06/09/2023]
Abstract
Selenium (Se) is a vital element which leads to strong antioxidation in animals and humans. However, the mechanism underlying natural cereal Se-induced biological changes is not well understood. This study intended to explore the gene differential expression in naturally aged mice exposed to selenium by RNA-Seq technique. A total spectrum of 53 differentially expressed genes was quantified in mice heart tissues treated with Se-rich and general rice. The GO functional annotation of differentially expressed genes disclosed the enrichment of cellular process, ionic binding, biological regulation, and catalytic activity. One hundred twenty-three differential pathways (cardiovascular diseases, immune system, transport and catabolism, longevity regulating, and PI3K-AKT signaling) were identified according to KEGG metabolic terms. Afterwards, the effect of Se-rich rice on the antioxidant activity was assessed. The selenium-rich diet increased the total antioxidant capacity (T-AOC), superoxide dismutase (SOD), and glutathione peroxidase (GSH-Px) in mice serum and livers while significantly reduces methane dicarboxylic aldehyde (MDA) contents. FOXO1 and FOXO3 genes, which acted as the regulators of apoptosis and the antioxidant enzyme, were significantly enhanced in mice when fed with Se-rich rice. In short, the present findings disclosed the alluring insights of organic and inorganic selenium sources on certain biological processes and antioxidant activity of living bodies. However, long-term trials are still required to draw a definitive conclusion, including risks and benefit analysis for various management strategies.
Collapse
Affiliation(s)
- Rui Zeng
- Demonstration Base for International Science & Technology Cooperation of Sichuan Province, Rice Research Institute, Sichuan Agricultural University, Chengdu, 611130, Sichuan, China
| | - Yuanke Liang
- Demonstration Base for International Science & Technology Cooperation of Sichuan Province, Rice Research Institute, Sichuan Agricultural University, Chengdu, 611130, Sichuan, China
| | - Muhammad Umer Farooq
- Demonstration Base for International Science & Technology Cooperation of Sichuan Province, Rice Research Institute, Sichuan Agricultural University, Chengdu, 611130, Sichuan, China
| | - Yujie Zhang
- Demonstration Base for International Science & Technology Cooperation of Sichuan Province, Rice Research Institute, Sichuan Agricultural University, Chengdu, 611130, Sichuan, China
| | - Hla Hla Ei
- Demonstration Base for International Science & Technology Cooperation of Sichuan Province, Rice Research Institute, Sichuan Agricultural University, Chengdu, 611130, Sichuan, China
| | - Zhichen Tang
- Demonstration Base for International Science & Technology Cooperation of Sichuan Province, Rice Research Institute, Sichuan Agricultural University, Chengdu, 611130, Sichuan, China
| | - Tengda Zheng
- Demonstration Base for International Science & Technology Cooperation of Sichuan Province, Rice Research Institute, Sichuan Agricultural University, Chengdu, 611130, Sichuan, China
| | - Yang Su
- Demonstration Base for International Science & Technology Cooperation of Sichuan Province, Rice Research Institute, Sichuan Agricultural University, Chengdu, 611130, Sichuan, China
| | - Xiaoying Ye
- Demonstration Base for International Science & Technology Cooperation of Sichuan Province, Rice Research Institute, Sichuan Agricultural University, Chengdu, 611130, Sichuan, China
| | - Xiaomei Jia
- Demonstration Base for International Science & Technology Cooperation of Sichuan Province, Rice Research Institute, Sichuan Agricultural University, Chengdu, 611130, Sichuan, China
| | - Jianqing Zhu
- Demonstration Base for International Science & Technology Cooperation of Sichuan Province, Rice Research Institute, Sichuan Agricultural University, Chengdu, 611130, Sichuan, China.
| |
Collapse
|
11
|
Mattioli K, Volders PJ, Gerhardinger C, Lee JC, Maass PG, Melé M, Rinn JL. High-throughput functional analysis of lncRNA core promoters elucidates rules governing tissue specificity. Genome Res 2019; 29:344-355. [PMID: 30683753 PMCID: PMC6396428 DOI: 10.1101/gr.242222.118] [Citation(s) in RCA: 95] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2018] [Accepted: 01/17/2019] [Indexed: 12/01/2022]
Abstract
Transcription initiates at both coding and noncoding genomic elements, including mRNA and long noncoding RNA (lncRNA) core promoters and enhancer RNAs (eRNAs). However, each class has a different expression profile with lncRNAs and eRNAs being the most tissue specific. How these complex differences in expression profiles and tissue specificities are encoded in a single DNA sequence remains unresolved. Here, we address this question using computational approaches and massively parallel reporter assays (MPRA) surveying hundreds of promoters and enhancers. We find that both divergent lncRNA and mRNA core promoters have higher capacities to drive transcription than nondivergent lncRNA and mRNA core promoters, respectively. Conversely, intergenic lncRNAs (lincRNAs) and eRNAs have lower capacities to drive transcription and are more tissue specific than divergent genes. This higher tissue specificity is strongly associated with having less complex transcription factor (TF) motif profiles at the core promoter. We experimentally validated these findings by testing both engineered single-nucleotide deletions and human single-nucleotide polymorphisms (SNPs) in MPRA. In both cases, we observe that single nucleotides associated with many motifs are important drivers of promoter activity. Thus, we suggest that high TF motif density serves as a robust mechanism to increase promoter activity at the expense of tissue specificity. Moreover, we find that 22% of common SNPs in core promoter regions have significant regulatory effects. Collectively, our findings show that high TF motif density provides redundancy and increases promoter activity at the expense of tissue specificity, suggesting that specificity of expression may be regulated by simplicity of motif usage.
Collapse
Affiliation(s)
- Kaia Mattioli
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, Massachusetts 02138, USA.,Department of Biological and Biomedical Sciences, Harvard Medical School, Boston, Massachusetts 02115, USA
| | - Pieter-Jan Volders
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, Massachusetts 02138, USA.,Department of Biomolecular Medicine, Ghent University, 9000 Ghent, Belgium.,VIB-UGent Center for Medical Biotechnology, VIB, 9000 Ghent, Belgium
| | - Chiara Gerhardinger
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, Massachusetts 02138, USA.,Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - James C Lee
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, Massachusetts 02138, USA.,Department of Medicine, University of Cambridge School of Clinical Medicine, Addenbrooke's Hospital, Cambridge CB2 0QQ, United Kingdom
| | - Philipp G Maass
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, Massachusetts 02138, USA.,Genetics and Genome Biology Program, Sickkids Research Institute, Toronto, Ontario M5G 0A4, Canada.,Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 1A1, Canada
| | - Marta Melé
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, Massachusetts 02138, USA.,Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA.,Life Sciences Department, Barcelona Supercomputing Center, Barcelona, Catalonia 08034, Spain
| | - John L Rinn
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, Massachusetts 02138, USA.,Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA.,Department of Pathology, Beth Israel Deaconess Medical Center, Boston, Massachusetts 02115, USA.,Department of Biochemistry, University of Colorado, BioFrontiers Institute, Boulder, Colorado 80301, USA
| |
Collapse
|
12
|
Promoter analysis and prediction in the human genome using sequence-based deep learning models. Bioinformatics 2019; 35:2730-2737. [DOI: 10.1093/bioinformatics/bty1068] [Citation(s) in RCA: 60] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2018] [Revised: 12/03/2018] [Accepted: 12/27/2018] [Indexed: 12/14/2022] Open
Abstract
Abstract
Motivation
Computational identification of promoters is notoriously difficult as human genes often have unique promoter sequences that provide regulation of transcription and interaction with transcription initiation complex. While there are many attempts to develop computational promoter identification methods, we have no reliable tool to analyze long genomic sequences.
Results
In this work, we further develop our deep learning approach that was relatively successful to discriminate short promoter and non-promoter sequences. Instead of focusing on the classification accuracy, in this work we predict the exact positions of the transcription start site inside the genomic sequences testing every possible location. We studied human promoters to find effective regions for discrimination and built corresponding deep learning models. These models use adaptively constructed negative set, which iteratively improves the model’s discriminative ability. Our method significantly outperforms the previously developed promoter prediction programs by considerably reducing the number of false-positive predictions. We have achieved error-per-1000-bp rate of 0.02 and have 0.31 errors per correct prediction, which is significantly better than the results of other human promoter predictors.
Availability and implementation
The developed method is available as a web server at http://www.cbrc.kaust.edu.sa/PromID/.
Collapse
|
13
|
Simón-Carrasco L, Graña O, Salmón M, Jacob HKC, Gutierrez A, Jiménez G, Drosten M, Barbacid M. Inactivation of Capicua in adult mice causes T-cell lymphoblastic lymphoma. Genes Dev 2017; 31:1456-1468. [PMID: 28827401 PMCID: PMC5588927 DOI: 10.1101/gad.300244.117] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2017] [Accepted: 07/24/2017] [Indexed: 12/19/2022]
Abstract
CIC (also known as Capicua) is a transcriptional repressor negatively regulated by RAS/MAPK signaling. Here, Simón-Carrasco et al. show that Cic inactivation in mice induces T-ALL by a mechanism involving derepression of its well-known target, Etv4. Cic inactivation renders T-ALL insensitive to MEK inhibitors in both mouse and human cell lines. CIC (also known as Capicua) is a transcriptional repressor negatively regulated by RAS/MAPK signaling. Whereas the functions of Cic have been well characterized in Drosophila, little is known about its role in mammals. CIC is inactivated in a variety of human tumors and has been implicated recently in the promotion of lung metastases. Here, we describe a mouse model in which we inactivated Cic by selectively disabling its DNA-binding activity, a mutation that causes derepression of its target genes. Germline Cic inactivation causes perinatal lethality due to lung differentiation defects. However, its systemic inactivation in adult mice induces T-cell acute lymphoblastic lymphoma (T-ALL), a tumor type known to carry CIC mutations, albeit with low incidence. Cic inactivation in mice induces T-ALL by a mechanism involving derepression of its well-known target, Etv4. Importantly, human T-ALL also relies on ETV4 expression for maintaining its oncogenic phenotype. Moreover, Cic inactivation renders T-ALL insensitive to MEK inhibitors in both mouse and human cell lines. Finally, we show that Ras-induced mouse T-ALL as well as human T-ALL carrying mutations in the RAS/MAPK pathway display a genetic signature indicative of Cic inactivation. These observations illustrate that CIC inactivation plays a key role in this human malignancy.
Collapse
Affiliation(s)
- Lucía Simón-Carrasco
- Molecular Oncology Programme, Centro Nacional de Investigaciones Oncológicas (CNIO), 28029 Madrid, Spain
| | - Osvaldo Graña
- Bioinformatics Unit, Structural Biology and Biocomputing Programme, Centro Nacional de Investigaciones Oncológicas (CNIO), 28029 Madrid, Spain
| | - Marina Salmón
- Molecular Oncology Programme, Centro Nacional de Investigaciones Oncológicas (CNIO), 28029 Madrid, Spain
| | - Harrys K C Jacob
- Molecular Oncology Programme, Centro Nacional de Investigaciones Oncológicas (CNIO), 28029 Madrid, Spain
| | - Alejandro Gutierrez
- Division of Hematology/Oncology, Boston Children's Hospital, Dana-Farber Cancer Institute, Harvard Medical School, Boston, Massachusetts 02115, USA
| | - Gerardo Jiménez
- Institut de Biologia Molecular de Barcelona-Consejo Superior de Investigaciones Científicas (CSIC), Parc Cientifíc de Barcelona, 08028 Barcelona, Spain.,Institució Catalana de Recerca i Estudis Avançats (ICREA), 08028 Barcelona, Spain
| | - Matthias Drosten
- Molecular Oncology Programme, Centro Nacional de Investigaciones Oncológicas (CNIO), 28029 Madrid, Spain
| | - Mariano Barbacid
- Molecular Oncology Programme, Centro Nacional de Investigaciones Oncológicas (CNIO), 28029 Madrid, Spain
| |
Collapse
|