1
|
Zrimec J, Buric F, Kokina M, Garcia V, Zelezniak A. Learning the Regulatory Code of Gene Expression. Front Mol Biosci 2021; 8:673363. [PMID: 34179082 PMCID: PMC8223075 DOI: 10.3389/fmolb.2021.673363] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2021] [Accepted: 05/24/2021] [Indexed: 11/13/2022] Open
Abstract
Data-driven machine learning is the method of choice for predicting molecular phenotypes from nucleotide sequence, modeling gene expression events including protein-DNA binding, chromatin states as well as mRNA and protein levels. Deep neural networks automatically learn informative sequence representations and interpreting them enables us to improve our understanding of the regulatory code governing gene expression. Here, we review the latest developments that apply shallow or deep learning to quantify molecular phenotypes and decode the cis-regulatory grammar from prokaryotic and eukaryotic sequencing data. Our approach is to build from the ground up, first focusing on the initiating protein-DNA interactions, then specific coding and non-coding regions, and finally on advances that combine multiple parts of the gene and mRNA regulatory structures, achieving unprecedented performance. We thus provide a quantitative view of gene expression regulation from nucleotide sequence, concluding with an information-centric overview of the central dogma of molecular biology.
Collapse
Affiliation(s)
- Jan Zrimec
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
| | - Filip Buric
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
| | - Mariia Kokina
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Victor Garcia
- School of Life Sciences and Facility Management, Zurich University of Applied Sciences, Wädenswil, Switzerland
| | - Aleksej Zelezniak
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
- Science for Life Laboratory, Stockholm, Sweden
| |
Collapse
|
2
|
Zrimec J, Börlin CS, Buric F, Muhammad AS, Chen R, Siewers V, Verendel V, Nielsen J, Töpel M, Zelezniak A. Deep learning suggests that gene expression is encoded in all parts of a co-evolving interacting gene regulatory structure. Nat Commun 2020; 11:6141. [PMID: 33262328 PMCID: PMC7708451 DOI: 10.1038/s41467-020-19921-4] [Citation(s) in RCA: 85] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2019] [Accepted: 11/02/2020] [Indexed: 12/31/2022] Open
Abstract
Understanding the genetic regulatory code governing gene expression is an important challenge in molecular biology. However, how individual coding and non-coding regions of the gene regulatory structure interact and contribute to mRNA expression levels remains unclear. Here we apply deep learning on over 20,000 mRNA datasets to examine the genetic regulatory code controlling mRNA abundance in 7 model organisms ranging from bacteria to Human. In all organisms, we can predict mRNA abundance directly from DNA sequence, with up to 82% of the variation of transcript levels encoded in the gene regulatory structure. By searching for DNA regulatory motifs across the gene regulatory structure, we discover that motif interactions could explain the whole dynamic range of mRNA levels. Co-evolution across coding and non-coding regions suggests that it is not single motifs or regions, but the entire gene regulatory structure and specific combination of regulatory elements that define gene expression levels.
Collapse
Affiliation(s)
- Jan Zrimec
- Department of Biology and Biological Engineering, Chalmers University of Technology, Kemivägen 10, SE-412 96, Gothenburg, Sweden
| | - Christoph S Börlin
- Department of Biology and Biological Engineering, Chalmers University of Technology, Kemivägen 10, SE-412 96, Gothenburg, Sweden
- Novo Nordisk Foundation Center for Biosustainability, Chalmers University of Technology, Kemivägen 10, SE-412 96, Gothenburg, Sweden
| | - Filip Buric
- Department of Biology and Biological Engineering, Chalmers University of Technology, Kemivägen 10, SE-412 96, Gothenburg, Sweden
| | - Azam Sheikh Muhammad
- Computer Science and Engineering, Chalmers University of Technology, Kemivägen 10, SE-412 96, Gothenburg, Sweden
| | - Rhongzen Chen
- Computer Science and Engineering, Chalmers University of Technology, Kemivägen 10, SE-412 96, Gothenburg, Sweden
| | - Verena Siewers
- Department of Biology and Biological Engineering, Chalmers University of Technology, Kemivägen 10, SE-412 96, Gothenburg, Sweden
- Novo Nordisk Foundation Center for Biosustainability, Chalmers University of Technology, Kemivägen 10, SE-412 96, Gothenburg, Sweden
| | - Vilhelm Verendel
- Computer Science and Engineering, Chalmers University of Technology, Kemivägen 10, SE-412 96, Gothenburg, Sweden
| | - Jens Nielsen
- Department of Biology and Biological Engineering, Chalmers University of Technology, Kemivägen 10, SE-412 96, Gothenburg, Sweden
- Novo Nordisk Foundation Center for Biosustainability, Chalmers University of Technology, Kemivägen 10, SE-412 96, Gothenburg, Sweden
| | - Mats Töpel
- Department of Marine Sciences, University of Gothenburg, Box 461, SE-405 30, Gothenburg, Sweden
- Gothenburg Global Biodiversity Center (GGBC), Box 461, 40530, Gothenburg, Sweden
| | - Aleksej Zelezniak
- Department of Biology and Biological Engineering, Chalmers University of Technology, Kemivägen 10, SE-412 96, Gothenburg, Sweden.
- Science for Life Laboratory, Tomtebodavägen 23a, SE-171 65, Stockholm, Sweden.
| |
Collapse
|
3
|
Zrimec J. Multiple plasmid origin-of-transfer regions might aid the spread of antimicrobial resistance to human pathogens. Microbiologyopen 2020; 9:e1129. [PMID: 33111499 PMCID: PMC7755788 DOI: 10.1002/mbo3.1129] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2020] [Revised: 09/21/2020] [Accepted: 09/21/2020] [Indexed: 12/12/2022] Open
Abstract
Antimicrobial resistance poses a great danger to humanity, in part due to the widespread horizontal gene transfer of plasmids via conjugation. Modeling of plasmid transfer is essential to uncovering the fundamentals of resistance transfer and for the development of predictive measures to limit the spread of resistance. However, a major limitation in the current understanding of plasmids is the incomplete characterization of the conjugative DNA transfer mechanisms, which conceals the actual potential for plasmid transfer in nature. Here, we consider that the plasmid-borne origin-of-transfer substrates encode specific DNA structural properties that can facilitate finding these regions in large datasets and develop a DNA structure-based alignment procedure for typing the transfer substrates that outperforms sequence-based approaches. Thousands of putative DNA transfer substrates are identified, showing that plasmid mobility can be twofold higher and span almost twofold more host species than is currently known. Over half of all putative mobile plasmids contain the means for mobilization by conjugation systems belonging to different mobility groups, which can hypothetically link previously confined host ranges across ecological habitats into a robust plasmid transfer network. This hypothetical network is found to facilitate the transfer of antimicrobial resistance from environmental genetic reservoirs to human pathogens, which might be an important driver of the observed rapid resistance development in humans and thus an important point of focus for future prevention measures.
Collapse
Affiliation(s)
- Jan Zrimec
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
| |
Collapse
|
4
|
Zrimec J, Lapanje A. DNA structure at the plasmid origin-of-transfer indicates its potential transfer range. Sci Rep 2018; 8:1820. [PMID: 29379098 PMCID: PMC5789077 DOI: 10.1038/s41598-018-20157-y] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2016] [Accepted: 01/10/2018] [Indexed: 11/29/2022] Open
Abstract
Horizontal gene transfer via plasmid conjugation enables antimicrobial resistance (AMR) to spread among bacteria and is a major health concern. The range of potential transfer hosts of a particular conjugative plasmid is characterised by its mobility (MOB) group, which is currently determined based on the amino acid sequence of the plasmid-encoded relaxase. To facilitate prediction of plasmid MOB groups, we have developed a bioinformatic procedure based on analysis of the origin-of-transfer (oriT), a merely 230 bp long non-coding plasmid DNA region that is the enzymatic substrate for the relaxase. By computationally interpreting conformational and physicochemical properties of the oriT region, which facilitate relaxase-oriT recognition and initiation of nicking, MOB groups can be resolved with over 99% accuracy. We have shown that oriT structural properties are highly conserved and can be used to discriminate among MOB groups more efficiently than the oriT nucleotide sequence. The procedure for prediction of MOB groups and potential transfer range of plasmids was implemented using published data and is available at http://dnatools.eu/MOB/plasmid.html.
Collapse
Affiliation(s)
- Jan Zrimec
- Institute of Metagenomics and Microbial Technologies, 1000, Ljubljana, Slovenia. .,Faculty of Health Sciences, University of Primorska, 6320, Izola, Slovenia. .,Department of Biology and Biological Engineering, Chalmers University of Technology, 412 96, Göteborg, Sweden.
| | - Aleš Lapanje
- Institute of Metagenomics and Microbial Technologies, 1000, Ljubljana, Slovenia. .,Department of Nanotechnology, Saratov State University, 410012, Saratov, Russian Federation. .,Department of Environmental Sciences, Institute Jožef Štefan, 1000, Ljubljana, Slovenia.
| |
Collapse
|
5
|
Tosato V, West N, Zrimec J, Nikitin DV, Del Sal G, Marano R, Breitenbach M, Bruschi CV. Bridge-Induced Translocation between NUP145 and TOP2 Yeast Genes Models the Genetic Fusion between the Human Orthologs Associated With Acute Myeloid Leukemia. Front Oncol 2017; 7:231. [PMID: 29034209 PMCID: PMC5626878 DOI: 10.3389/fonc.2017.00231] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2017] [Accepted: 09/07/2017] [Indexed: 01/03/2023] Open
Abstract
In mammalian organisms liquid tumors such as acute myeloid leukemia (AML) are related to spontaneous chromosomal translocations ensuing in gene fusions. We previously developed a system named bridge-induced translocation (BIT) that allows linking together two different chromosomes exploiting the strong endogenous homologous recombination system of the yeast Saccharomyces cerevisiae. The BIT system generates a heterogeneous population of cells with different aneuploidies and severe aberrant phenotypes reminiscent of a cancerogenic transformation. In this work, thanks to a complex pop-out methodology of the marker used for the selection of translocants, we succeeded by BIT technology to precisely reproduce in yeast the peculiar chromosome translocation that has been associated with AML, characterized by the fusion between the human genes NUP98 and TOP2B. To shed light on the origin of the DNA fragility within NUP98, an extensive analysis of the curvature, bending, thermostability, and B-Z transition aptitude of the breakpoint region of NUP98 and of its yeast ortholog NUP145 has been performed. On this basis, a DNA cassette carrying homologous tails to the two genes was amplified by PCR and allowed the targeted fusion between NUP145 and TOP2, leading to reproduce the chimeric transcript in a diploid strain of S. cerevisiae. The resulting translocated yeast obtained through BIT appears characterized by abnormal spherical bodies of nearly 500 nm of diameter, absence of external membrane and defined cytoplasmic localization. Since Nup98 is a well-known regulator of the post-transcriptional modification of P53 target genes, and P53 mutations are occasionally reported in AML, this translocant yeast strain can be used as a model to test the constitutive expression of human P53. Although the abnormal phenotype of the translocant yeast was never rescued by its expression, an exogenous P53 was recognized to confer increased vitality to the translocants, in spite of its usual and well-documented toxicity to wild-type yeast strains. These results obtained in yeast could provide new grounds for the interpretation of past observations made in leukemic patients indicating a possible involvement of P53 in cell transformation toward AML.
Collapse
Affiliation(s)
- Valentina Tosato
- Ulisse Biomed S.r.l., AREA Science Park, Trieste, Italy.,Faculty of Health Sciences, University of Primorska, Izola, Slovenia.,Yeast Molecular Genetics, ICGEB, AREA Science Park, Trieste, Italy
| | - Nicole West
- Clinical Pathology, Hospital Maggiore, Trieste, Italy
| | - Jan Zrimec
- Faculty of Health Sciences, University of Primorska, Izola, Slovenia
| | - Dmitri V Nikitin
- Biology Faculty, M.V. Lomonosov Moscow State University, Moscow, Russia
| | - Giannino Del Sal
- Department of Life Sciences, University of Trieste, Trieste, Italy
| | - Roberto Marano
- Department of Life Sciences, University of Trieste, Trieste, Italy
| | - Michael Breitenbach
- Genetics Division, Department of Cell Biology, University of Salzburg, Salzburg, Austria
| | - Carlo V Bruschi
- Yeast Molecular Genetics, ICGEB, AREA Science Park, Trieste, Italy.,Genetics Division, Department of Cell Biology, University of Salzburg, Salzburg, Austria
| |
Collapse
|
6
|
Amarante TD, Weber G. Evaluating Hydrogen Bonds and Base Stacking of Single, Tandem and Terminal GU Mismatches in RNA with a Mesoscopic Model. J Chem Inf Model 2015; 56:101-9. [DOI: 10.1021/acs.jcim.5b00571] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Tauanne D. Amarante
- Departamento de Física, Universidade Federal de Minas Gerais, 31270-901 Belo
Horizonte-MG, Brazil
| | - Gerald Weber
- Departamento de Física, Universidade Federal de Minas Gerais, 31270-901 Belo
Horizonte-MG, Brazil
| |
Collapse
|