1
|
Rybina AA, Glushak RA, Bessonova TA, Dakhnovets AI, Rudenko AY, Ozhiganov RM, Kaznadzey AD, Tutukina MN, Gelfand MS. Phylogeny and structural modeling of the transcription factor CsqR (YihW) from Escherichia coli. Sci Rep 2024; 14:7852. [PMID: 38570624 PMCID: PMC10991401 DOI: 10.1038/s41598-024-58492-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Accepted: 03/29/2024] [Indexed: 04/05/2024] Open
Abstract
CsqR (YihW) is a local transcription factor that controls expression of yih genes involved in degradation of sulfoquinovose in Escherichia coli. We recently showed that expression of the respective gene cassette might be regulated by lactose. Here, we explore the phylogenetic and functional traits of CsqR. Phylogenetic analysis revealed that CsqR had a conserved Met25. Western blot demonstrated that CsqR was synthesized in the bacterial cell as two protein forms, 28.5 (CsqR-l) and 26 kDa (CsqR-s), the latter corresponding to start of translation at Met25. CsqR-s was dramatically activated during growth with sulfoquinovose as a sole carbon source, and displaced CsqR-l in the stationary phase during growth on rich medium. Molecular dynamic simulations revealed two possible states of the CsqR-s structure, with the interdomain linker being represented by either a disordered loop or an ɑ-helix. This helix allowed the hinge-like motion of the N-terminal domain resulting in a switch of CsqR-s between two conformational states, "open" and "compact". We then modeled the interaction of both CsqR forms with putative effectors sulfoquinovose, sulforhamnose, sulfoquinovosyl glycerol, and lactose, and revealed that they all preferred the same pocket in CsqR-l, while in CsqR-s there were two possible options dependent on the linker structure.
Collapse
|
2
|
Garushyants SK, Sane M, Selifanova MV, Agashe D, Bazykin GA, Gelfand MS. Mutational Signatures in Wild Type Escherichia coli Strains Reveal Predominance of DNA Polymerase Errors. Genome Biol Evol 2024; 16:evae035. [PMID: 38401265 PMCID: PMC10995721 DOI: 10.1093/gbe/evae035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Revised: 02/13/2024] [Accepted: 02/17/2024] [Indexed: 02/26/2024] Open
Abstract
While mutational processes operating in the Escherichia coli genome have been revealed by multiple laboratory experiments, the contribution of these processes to accumulation of bacterial polymorphism and evolution in natural environments is unknown. To address this question, we reconstruct signatures of distinct mutational processes from experimental data on E. coli hypermutators, and ask how these processes contribute to differences between naturally occurring E. coli strains. We show that both mutations accumulated in the course of evolution of wild-type strains in nature and in the lab-grown nonmutator laboratory strains are explained predominantly by the low fidelity of DNA polymerases II and III. By contrast, contributions specific to disruption of DNA repair systems cannot be detected, suggesting that temporary accelerations of mutagenesis associated with such disruptions are unimportant for within-species evolution. These observations demonstrate that accumulation of diversity in bacterial strains in nature is predominantly associated with errors of DNA polymerases.
Collapse
|
3
|
Bulygin I, Shatov V, Rykachevskiy A, Raiko A, Bernstein A, Burnaev E, Gelfand MS. Absence of enterotypes in the human gut microbiomes reanalyzed with non-linear dimensionality reduction methods. PeerJ 2023; 11:e15838. [PMID: 37701837 PMCID: PMC10494839 DOI: 10.7717/peerj.15838] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2022] [Accepted: 07/12/2023] [Indexed: 09/14/2023] Open
Abstract
Enterotypes of the human gut microbiome have been proposed to be a powerful prognostic tool to evaluate the correlation between lifestyle, nutrition, and disease. However, the number of enterotypes suggested in the literature ranged from two to four. The growth of available metagenome data and the use of exact, non-linear methods of data analysis challenges the very concept of clusters in the multidimensional space of bacterial microbiomes. Using several published human gut microbiome datasets of variable 16S rRNA regions, we demonstrate the presence of a lower-dimensional structure in the microbiome space, with high-dimensional data concentrated near a low-dimensional non-linear submanifold, but the absence of distinct and stable clusters that could represent enterotypes. This observation is robust with regard to diverse combinations of dimensionality reduction techniques and clustering algorithms.
Collapse
|
4
|
Gaydukova SA, Moldovan MA, Vallesi A, Heaphy SM, Atkins JF, Gelfand MS, Baranov PV. Nontriplet feature of genetic code in Euplotes ciliates is a result of neutral evolution. Proc Natl Acad Sci U S A 2023; 120:e2221683120. [PMID: 37216548 PMCID: PMC10235951 DOI: 10.1073/pnas.2221683120] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Accepted: 04/12/2023] [Indexed: 05/24/2023] Open
Abstract
The triplet nature of the genetic code is considered a universal feature of known organisms. However, frequent stop codons at internal mRNA positions in Euplotes ciliates ultimately specify ribosomal frameshifting by one or two nucleotides depending on the context, thus posing a nontriplet feature of the genetic code of these organisms. Here, we sequenced transcriptomes of eight Euplotes species and assessed evolutionary patterns arising at frameshift sites. We show that frameshift sites are currently accumulating more rapidly by genetic drift than they are removed by weak selection. The time needed to reach the mutational equilibrium is several times longer than the age of Euplotes and is expected to occur after a several-fold increase in the frequency of frameshift sites. This suggests that Euplotes are at an early stage of the spread of frameshifting in expression of their genome. In addition, we find the net fitness burden of frameshift sites to be noncritical for the survival of Euplotes. Our results suggest that fundamental genome-wide changes such as a violation of the triplet character of genetic code can be introduced and maintained solely by neutral evolution.
Collapse
|
5
|
Kobets VA, Ulianov SV, Galitsyna AA, Doronin SA, Mikhaleva EA, Gelfand MS, Shevelyov YY, Razin SV, Khrameeva EE. HiConfidence: a novel approach uncovering the biological signal in Hi-C data affected by technical biases. Brief Bioinform 2023; 24:7033301. [PMID: 36759336 PMCID: PMC10025441 DOI: 10.1093/bib/bbad044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2022] [Revised: 01/04/2023] [Accepted: 01/20/2023] [Indexed: 02/11/2023] Open
Abstract
The chromatin interaction assays, particularly Hi-C, enable detailed studies of genome architecture in multiple organisms and model systems, resulting in a deeper understanding of gene expression regulation mechanisms mediated by epigenetics. However, the analysis and interpretation of Hi-C data remain challenging due to technical biases, limiting direct comparisons of datasets obtained in different experiments and laboratories. As a result, removing biases from Hi-C-generated chromatin contact matrices is a critical data analysis step. Our novel approach, HiConfidence, eliminates biases from the Hi-C data by weighing chromatin contacts according to their consistency between replicates so that low-quality replicates do not substantially influence the result. The algorithm is effective for the analysis of global changes in chromatin structures such as compartments and topologically associating domains. We apply the HiConfidence approach to several Hi-C datasets with significant technical biases, that could not be analyzed effectively using existing methods, and obtain meaningful biological conclusions. In particular, HiConfidence aids in the study of how changes in histone acetylation pattern affect chromatin organization in Drosophila melanogaster S2 cells. The method is freely available at GitHub: https://github.com/victorykobets/HiConfidence.
Collapse
|
6
|
Tutukina MN, Dakhnovets AI, Kaznadzey AD, Gelfand MS, Ozoline ON. Sense and antisense RNA products of the uxuR gene can affect motility and chemotaxis acting independent of the UxuR protein. Front Mol Biosci 2023; 10:1121376. [PMID: 36936992 PMCID: PMC10016265 DOI: 10.3389/fmolb.2023.1121376] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2022] [Accepted: 02/06/2023] [Indexed: 02/19/2023] Open
Abstract
Small non-coding and antisense RNAs are widespread in all kingdoms of life, however, the diversity of their functions in bacteria is largely unknown. Here, we study RNAs synthesised from divergent promoters located in the 3'-end of the uxuR gene, encoding transcription factor regulating hexuronate metabolism in Escherichia coli. These overlapping promoters were predicted in silico with rather high scores, effectively bound RNA polymerase in vitro and in vivo and were capable of initiating transcription in sense and antisense directions. The genome-wide correlation between in silico promoter scores and RNA polymerase binding in vitro and in vivo was higher for promoters located on the antisense strands of the genes, however, sense promoters within the uxuR gene were more active. Both regulatory RNAs synthesised from the divergent promoters inhibited expression of genes associated with the E. coli motility and chemotaxis independent of a carbon source on which bacteria had been grown. Direct effects of these RNAs were confirmed for the fliA gene encoding σ28 subunit of RNA polymerase. In addition to intracellular sRNAs, promoters located within the uxuR gene could initiate synthesis of transcripts found in the fraction of RNAs secreted in the extracellular medium. Their profile was also carbon-independent suggesting that intragenic uxuR transcripts have a specific regulatory role not directly related to the function of the protein in which gene they are encoded.
Collapse
|
7
|
Semenkov IN, Shelyakin PV, Nikolaeva DD, Tutukina MN, Sharapova AV, Lednev SA, Sarana YV, Gelfand MS, Krechetov PP, Koroleva TV. Data on the temporal changes in soil properties and microbiome composition after a jet-fuel contamination during the pot and field experiments. Data Brief 2022; 46:108860. [PMID: 36632439 PMCID: PMC9826931 DOI: 10.1016/j.dib.2022.108860] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Revised: 12/20/2022] [Accepted: 12/21/2022] [Indexed: 12/29/2022] Open
Abstract
The soil response to a jet-fuel contamination is uncertain. In this article, original data on the influence of a jet-fuel spillage on the topsoil properties are presented. The data set is obtained during a one-year long pot and field experiments with Dystric Arenosols, Fibric Histosols and Albic Luvisols. Kerosene loads were 1, 5, 10, 25 and 100 g/kg. The data set includes information about temporal changes in kerosene concentration; physicochemical properties, such as рН, moisture, cation exchange capacity, content of soil organic matter, available P and K, exchangeable NH4 +, and water-soluble NO3 -; and biological properties, such as biological consumption of oxygen, and cellulolytic activity. Also, we provide sequencing data on variable regions of 16S ribosomal RNA of microbial communities from the respective soil samples.
Collapse
Key Words
- AL, Albic Luvisols
- ASV, amplicon sequence variant
- Bearing capacity
- CA, cellulolytic activity
- CEC, cation exchange capacity
- DA, Dystric Arenosols
- DNA, deoxyribonucleic acid
- EDTA, Ethylenediaminetetraacetic acid
- Ecological indicators
- FH, Fibric Histosols
- Gasoline
- Kav, available potassium
- NH4+, exchangeable ammonium
- NO3–, water-soluble nitrate
- PCR, polymerase chain reaction
- Pav, available phosphorus
- SOM, soil organic matter
- Soil metagenome
- Soil pollution
- Total petroleum hydrocarbons
- WMO, World Meteorological Organization
- Xenobiotic compounds
- qPCR, real-time polymerase chain reaction
- rRNA, ribosomal ribonucleic acid
Collapse
|
8
|
Grigorashvili EI, Chervontseva ZS, Gelfand MS. Predicting RNA secondary structure by a neural network: what features may be learned? PeerJ 2022; 10:e14335. [PMID: 36530406 PMCID: PMC9756865 DOI: 10.7717/peerj.14335] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2022] [Accepted: 10/12/2022] [Indexed: 12/14/2022] Open
Abstract
Deep learning is a class of machine learning techniques capable of creating internal representation of data without explicit preprogramming. Hence, in addition to practical applications, it is of interest to analyze what features of biological data may be learned by such models. Here, we describe PredPair, a deep learning neural network trained to predict base pairs in RNA structure from sequence alone, without any incorporated prior knowledge, such as the stacking energies or possible spatial structures. PredPair learned the Watson-Crick and wobble base-pairing rules and created an internal representation of the stacking energies and helices. Application to independent experimental (DMS-Seq) data on nucleotide accessibility in mRNA showed that the nucleotides predicted as paired indeed tend to be involved in the RNA structure. The performance of the constructed model was comparable with the state-of-the-art method based on the thermodynamic approach, but with a higher false positives rate. On the other hand, it successfully predicted pseudoknots. t-SNE clusters of embeddings of RNA sequences created by PredPair tend to contain embeddings from particular Rfam families, supporting the predictions of PredPair being in line with biological classification.
Collapse
|
9
|
Ozerova AM, Gelfand MS. Recapitulation of the embryonic transcriptional program in holometabolous insect pupae. Sci Rep 2022; 12:17570. [PMID: 36266393 PMCID: PMC9584902 DOI: 10.1038/s41598-022-22188-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2022] [Accepted: 10/11/2022] [Indexed: 01/13/2023] Open
Abstract
Holometabolous insects are predominantly motionless during metamorphosis, when no active feeding is observed and the body is enclosed in a hardened cuticle. These physiological properties as well as undergoing processes resemble embryogenesis, since at the pupal stage organs and systems of the imago are formed. Therefore, recapitulation of the embryonic expression program during metamorphosis could be hypothesized. To assess this hypothesis at the transcriptome level, we have performed a comprehensive analysis of the developmental datasets available in the public domain. Indeed, for most datasets, the pupal gene expression resembles the embryonic rather than the larval pattern, interrupting gradual changes in the transcriptome. Moreover, changes in the transcriptome profile during the pupa-to-imago transition are positively correlated with those at the embryo-to-larvae transition, suggesting that similar expression programs are activated. Gene sets that change their expression level during the larval stage and revert it to the embryonic-like state during the metamorphosis are enriched with genes associated with metabolism and development.
Collapse
|
10
|
Ashniev GA, Sernova NV, Shevkoplias AE, Rodionov ID, Rodionova IA, Vitreschak AG, Gelfand MS, Rodionov DA. Evolution of transcriptional regulation of histidine metabolism in Gram-positive bacteria. BMC Genomics 2022; 23:558. [PMID: 36008760 PMCID: PMC9413887 DOI: 10.1186/s12864-022-08796-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2022] [Accepted: 07/27/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The histidine metabolism and transport (his) genes are controlled by a variety of RNA-dependent regulatory systems among diverse taxonomic groups of bacteria including T-box riboswitches in Firmicutes and Actinobacteria and RNA attenuators in Proteobacteria. Using a comparative genomic approach, we previously identified a novel DNA-binding transcription factor (named HisR) that controls the histidine metabolism genes in diverse Gram-positive bacteria from the Firmicutes phylum. RESULTS Here we report the identification of HisR-binding sites within the regulatory regions of the histidine metabolism and transport genes in 395 genomes representing the Bacilli, Clostridia, Negativicutes, and Tissierellia classes of Firmicutes, as well as in 97 other HisR-encoding genomes from the Actinobacteria, Proteobacteria, and Synergistetes phyla. HisR belongs to the TrpR family of transcription factors, and their predicted DNA binding motifs have a similar 20-bp palindromic structure but distinct lineage-specific consensus sequences. The predicted HisR-binding motif was validated in vitro using DNA binding assays with purified protein from the human gut bacterium Ruminococcus gnavus. To fill a knowledge gap in the regulation of histidine metabolism genes in Firmicutes genomes that lack a hisR repressor gene, we systematically searched their upstream regions for potential RNA regulatory elements. As result, we identified 158 T-box riboswitches preceding the histidine biosynthesis and/or transport genes in 129 Firmicutes genomes. Finally, novel candidate RNA attenuators were identified upstream of the histidine biosynthesis operons in six species from the Bacillus cereus group, as well as in five Eubacteriales and six Erysipelotrichales species. CONCLUSIONS The obtained distribution of the HisR transcription factor and two RNA-mediated regulatory mechanisms for histidine metabolism genes across over 600 species of Firmicutes is discussed from functional and evolutionary points of view.
Collapse
|
11
|
Bessonova TA, Fando MS, Kostareva OS, Tutukina MN, Ozoline ON, Gelfand MS, Nikulin AD, Tishchenko SV. Differential Impact of Hexuronate Regulators ExuR and UxuR on the Escherichia coli Proteome. Int J Mol Sci 2022; 23:ijms23158379. [PMID: 35955512 PMCID: PMC9369180 DOI: 10.3390/ijms23158379] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2022] [Revised: 07/19/2022] [Accepted: 07/26/2022] [Indexed: 11/16/2022] Open
Abstract
ExuR and UxuR are paralogous proteins belonging to the GntR family of transcriptional regulators. Both are known to control hexuronic acid metabolism in a variety of Gammaproteobacteria but the relative impact of each of them is still unclear. Here, we apply 2D difference electrophoresis followed by mass-spectrometry to characterise the changes in the Escherichia coli proteome in response to a uxuR or exuR deletion. Our data clearly show that the effects are different: deletion of uxuR resulted in strongly enhanced expression of D-mannonate dehydratase UxuA and flagellar protein FliC, and in a reduced amount of outer membrane porin OmpF, while the absence of ExuR did not significantly alter the spectrum of detected proteins. Consequently, the physiological roles of proteins predicted as homologs seem to be far from identical. Effects of uxuR deletion were largely dependent on the cultivation conditions: during growth with glucose, UxuA and FliC were dramatically altered, while during growth with glucuronate, activation of both was not so prominent. During the growth with glucose, maximal activation was detected for FliC. This was further confirmed by expression analysis and physiological tests, thus suggesting the involvement of UxuR in the regulation of bacterial motility and biofilm formation.
Collapse
|
12
|
Moldovan MA, Chervontseva ZS, Nogina DS, Gelfand MS. A hierarchy in clusters of cephalopod mRNA editing sites. Sci Rep 2022; 12:3447. [PMID: 35236910 PMCID: PMC8891338 DOI: 10.1038/s41598-022-07460-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2021] [Accepted: 02/07/2022] [Indexed: 11/09/2022] Open
Abstract
RNA editing in the form of substituting adenine with inosine (A-to-I editing) is the most frequent type of RNA editing in many metazoan species. In most species, A-to-I editing sites tend to form clusters and editing at clustered sites depends on editing of the adjacent sites. Although functionally important in some specific cases, A-to-I editing usually is rare. The exception occurs in soft-bodied coleoid cephalopods, where tens of thousands of potentially important A-to-I editing sites have been identified, making coleoids an ideal model for studying of properties and evolution of A-to-I editing sites. Here, we apply several diverse techniques to demonstrate a strong tendency of coleoid RNA editing sites to cluster along the transcript. We show that clustering of editing sites and correlated editing substantially contribute to the transcriptome diversity that arises due to extensive RNA editing. Moreover, we identify three distinct types of editing site clusters, varying in size, and describe RNA structural features and mechanisms likely underlying formation of these clusters. In particular, these observations may explain sequence conservation at large distances around editing sites and the observed dependency of editing on mutations in the vicinity of editing sites.
Collapse
|
13
|
Shelyakin PV, Semenkov IN, Tutukina MN, Nikolaeva DD, Sharapova AV, Sarana YV, Lednev SA, Smolenkov AD, Gelfand MS, Krechetov PP, Koroleva TV. The Influence of Kerosene on Microbiomes of Diverse Soils. Life (Basel) 2022; 12:life12020221. [PMID: 35207510 PMCID: PMC8878009 DOI: 10.3390/life12020221] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2021] [Revised: 01/17/2022] [Accepted: 01/27/2022] [Indexed: 01/04/2023] Open
Abstract
One of the most important challenges for soil science is to determine the limits for the sustainable functioning of contaminated ecosystems. The response of soil microbiomes to kerosene pollution is still poorly understood. Here, we model the impact of kerosene leakage on the composition of the topsoil microbiome in pot and field experiments with different loads of added kerosene (loads up to 100 g/kg; retention time up to 360 days). At four time points we measured kerosene concentration and sequenced variable regions of 16S ribosomal RNA in the microbial communities. Mainly alkaline Dystric Arenosols with low content of available phosphorus and soil organic matter had an increased fraction of Actinobacteriota, Firmicutes, Nitrospirota, Planctomycetota, and, to a lesser extent, Acidobacteriota and Verrucomicobacteriota. In contrast, in highly acidic Fibric Histosols, rich in soil organic matter and available phosphorus, the fraction of Acidobacteriota was higher, while the fraction of Actinobacteriota was lower. Albic Luvisols occupied an intermediate position in terms of both physicochemical properties and microbiome composition. The microbiomes of different soils show similar response to equal kerosene loads. In highly contaminated soils, the proportion of anaerobic bacteria-metabolizing hydrocarbons increased, whereas the proportion of aerobic bacteria decreased. During the field experiment, the soil microbiome recovered much faster than in the pot experiments, possibly due to migration of microorganisms from the polluted area. The microbial community of Fibric Histosols recovered in 6 months after kerosene had been loaded, while microbiomes of Dystric Arenosols and Albic Luvisols did not restore even after a year.
Collapse
|
14
|
Galitsyna AA, Gelfand MS. Single-cell Hi-C data analysis: safety in numbers. Brief Bioinform 2021; 22:bbab316. [PMID: 34406348 PMCID: PMC8575028 DOI: 10.1093/bib/bbab316] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2021] [Revised: 07/09/2021] [Accepted: 07/21/2021] [Indexed: 02/06/2023] Open
Abstract
Over the past decade, genome-wide assays for chromatin interactions in single cells have enabled the study of individual nuclei at unprecedented resolution and throughput. Current chromosome conformation capture techniques survey contacts for up to tens of thousands of individual cells, improving our understanding of genome function in 3D. However, these methods recover a small fraction of all contacts in single cells, requiring specialised processing of sparse interactome data. In this review, we highlight recent advances in methods for the interpretation of single-cell genomic contacts. After discussing the strengths and limitations of these methods, we outline frontiers for future development in this rapidly moving field.
Collapse
|
15
|
Chervova A, Fatykhov B, Koblov A, Shvarov E, Preobrazhenskaya J, Vinogradov D, Ponomarev GV, Gelfand MS, Kazanov MD. Analysis of gene expression and mutation data points on contribution of transcription to the mutagenesis by APOBEC enzymes. NAR Cancer 2021; 3:zcab025. [PMID: 34316712 PMCID: PMC8253550 DOI: 10.1093/narcan/zcab025] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2020] [Revised: 06/04/2021] [Accepted: 06/14/2021] [Indexed: 11/30/2022] Open
Abstract
Since the discovery of the role of the APOBEC enzymes in human cancers, the mechanisms of this type of mutagenesis remain little understood. Theoretically, targeting of single-stranded DNA by the APOBEC enzymes could occur during cellular processes leading to the unwinding of DNA double-stranded structure. Some evidence points to the importance of replication in the APOBEC mutagenesis, while the role of transcription is still underexplored. Here, we analyzed gene expression and whole genome sequencing data from five types of human cancers with substantial APOBEC activity to estimate the involvement of transcription in the APOBEC mutagenesis and compare its impact with that of replication. Using the TCN motif as the mutation signature of the APOBEC enzymes, we observed a correlation of active APOBEC mutagenesis with gene expression, confirmed the increase of APOBEC-induced mutations in early-replicating regions and estimated the relative impact of transcription and replication on the APOBEC mutagenesis. We also found that the known effect of higher density of APOBEC-induced mutations on the lagging strand was highest in middle-replicating regions and observed higher APOBEC mutation density on the sense strand, the latter bias positively correlated with the gene expression level.
Collapse
|
16
|
Stetsenko A, Stehantsev P, Dranenko NO, Gelfand MS, Guskov A. Structural and biochemical characterization of a novel ZntB (CmaX) transporter protein from Pseudomonas aeruginosa. Int J Biol Macromol 2021; 184:760-767. [PMID: 34175341 DOI: 10.1016/j.ijbiomac.2021.06.130] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2021] [Revised: 06/16/2021] [Accepted: 06/18/2021] [Indexed: 11/19/2022]
Abstract
The 2-TM-GxN family of membrane proteins is widespread in prokaryotes and plays an important role in transport of divalent cations. The canonical signature motif, which is also a selectivity filter, has a composition of Gly-Met-Asn. Some members though deviate from this composition, however no data are available as to whether this has any functional implications. Here we report the functional and structural analysis of CmaX protein from a pathogenic Pseudomonas aeruginosa bacterium, which has a Gly-Ile-Asn signature motif. CmaX readily transports Zn2+, Mg2+, Cd2+, Ni2+ and Co2+ ions, but it does not utilize proton-symport as does ZntB from Escherichia coli. Together with the bioinformatics analysis, our data suggest that deviations from the canonical signature motif do not reveal any changes in substrate selectivity or transport and easily alter in course of evolution.
Collapse
|
17
|
Nikolaeva DD, Gelfand MS, Garushyants SK. Simplification of Ribosomes in Bacteria with Tiny Genomes. Mol Biol Evol 2021; 38:58-66. [PMID: 32681797 PMCID: PMC7782861 DOI: 10.1093/molbev/msaa184] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
The ribosome is an essential cellular machine performing protein biosynthesis. Its structure and composition are highly conserved in all species. However, some bacteria have been reported to have an incomplete set of ribosomal proteins. We have analyzed ribosomal protein composition in 214 small bacterial genomes (<1 Mb) and found that although the ribosome composition is fairly stable, some ribosomal proteins may be absent, especially in bacteria with dramatically reduced genomes. The protein composition of the large subunit is less conserved than that of the small subunit. We have identified the set of frequently lost ribosomal proteins and demonstrated that they tend to be positioned on the ribosome surface and have fewer contacts to other ribosome components. Moreover, some proteins are lost in an evolutionary correlated manner. The reduction of ribosomal RNA is also common, with deletions mostly occurring in free loops. Finally, the loss of the anti-Shine-Dalgarno sequence is associated with the loss of a higher number of ribosomal proteins.
Collapse
|
18
|
Suvorova IA, Gelfand MS. Comparative Analysis of the IclR-Family of Bacterial Transcription Factors and Their DNA-Binding Motifs: Structure, Positioning, Co-Evolution, Regulon Content. Front Microbiol 2021; 12:675815. [PMID: 34177859 PMCID: PMC8222616 DOI: 10.3389/fmicb.2021.675815] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2021] [Accepted: 05/14/2021] [Indexed: 11/13/2022] Open
Abstract
The IclR-family is a large group of transcription factors (TFs) regulating various biological processes in diverse bacteria. Using comparative genomics techniques, we have identified binding motifs of IclR-family TFs, reconstructed regulons and analyzed their content, finding co-occurrences between the regulated COGs (clusters of orthologous genes), useful for future functional characterizations of TFs and their regulated genes. We describe two main types of IclR-family motifs, similar in sequence but different in the arrangement of the half-sites (boxes), with GKTYCRYW3-4RYGRAMC and TGRAACAN1-2TGTTYCA consensuses, and also predict that TFs in 32 orthologous groups have binding sites comprised of three boxes with alternating direction, which implies two possible alternative modes of dimerization of TFs. We identified trends in site positioning relative to the translational gene start, and show that TFs in 94 orthologous groups bind tandem sites with 18-22 nucleotides between their centers. We predict protein-DNA contacts via the correlation analysis of nucleotides in binding sites and amino acids of the DNA-binding domain of TFs, and show that the majority of interacting positions and predicted contacts are similar for both types of motifs and conform well both to available experimental data and to general protein-DNA interaction trends.
Collapse
|
19
|
Seferbekova Z, Zabelkin A, Yakovleva Y, Afasizhev R, Dranenko NO, Alexeev N, Gelfand MS, Bochkareva OO. High Rates of Genome Rearrangements and Pathogenicity of Shigella spp. Front Microbiol 2021; 12:628622. [PMID: 33912145 PMCID: PMC8072062 DOI: 10.3389/fmicb.2021.628622] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2020] [Accepted: 03/22/2021] [Indexed: 02/01/2023] Open
Abstract
Shigella are pathogens originating within the Escherichia lineage but frequently classified as a separate genus. Shigella genomes contain numerous insertion sequences (ISs) that lead to pseudogenisation of affected genes and an increase of non-homologous recombination. Here, we study 414 genomes of E. coli and Shigella strains to assess the contribution of genomic rearrangements to Shigella evolution. We found that Shigella experienced exceptionally high rates of intragenomic rearrangements and had a decreased rate of homologous recombination compared to pathogenic and non-pathogenic E. coli. The high rearrangement rate resulted in independent disruption of syntenic regions and parallel rearrangements in different Shigella lineages. Specifically, we identified two types of chromosomally encoded E3 ubiquitin-protein ligases acquired independently by all Shigella strains that also showed a high level of sequence conservation in the promoter and further in the 5′-intergenic region. In the only available enteroinvasive E. coli (EIEC) strain, which is a pathogenic E. coli with a phenotype intermediate between Shigella and non-pathogenic E. coli, we found a rate of genome rearrangements comparable to those in other E. coli and no functional copies of the two Shigella-specific E3 ubiquitin ligases. These data indicate that the accumulation of ISs influenced many aspects of genome evolution and played an important role in the evolution of intracellular pathogens. Our research demonstrates the power of comparative genomics-based on synteny block composition and an important role of non-coding regions in the evolution of genomic islands.
Collapse
|
20
|
Fedorov AK, Gelfand MS. Towards practical applications in quantum computational biology. NATURE COMPUTATIONAL SCIENCE 2021; 1:114-119. [PMID: 38217223 DOI: 10.1038/s43588-021-00024-z] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/09/2020] [Accepted: 01/12/2021] [Indexed: 01/15/2024]
Abstract
Fascinating progress in understanding our world at the smallest scales moves us to the border of a new technological revolution governed by quantum physics. By taking advantage of quantum phenomena, quantum computing devices allow a speedup in solving diverse tasks. In this Perspective, we discuss the potential impact of quantum computing on computational biology. Bearing in mind the limitations of existing quantum computing devices, we attempt to indicate promising directions for further research in the emerging area of quantum computational biology.
Collapse
|
21
|
Nikolaeva DD, Gelfand MS, Garushyants SK. Simplification of Ribosomes in Bacteria with Tiny Genomes. Mol Biol Evol 2021. [PMID: 32681797 DOI: 10.1101/755876] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/12/2023] Open
Abstract
The ribosome is an essential cellular machine performing protein biosynthesis. Its structure and composition are highly conserved in all species. However, some bacteria have been reported to have an incomplete set of ribosomal proteins. We have analyzed ribosomal protein composition in 214 small bacterial genomes (<1 Mb) and found that although the ribosome composition is fairly stable, some ribosomal proteins may be absent, especially in bacteria with dramatically reduced genomes. The protein composition of the large subunit is less conserved than that of the small subunit. We have identified the set of frequently lost ribosomal proteins and demonstrated that they tend to be positioned on the ribosome surface and have fewer contacts to other ribosome components. Moreover, some proteins are lost in an evolutionary correlated manner. The reduction of ribosomal RNA is also common, with deletions mostly occurring in free loops. Finally, the loss of the anti-Shine-Dalgarno sequence is associated with the loss of a higher number of ribosomal proteins.
Collapse
|
22
|
Moldovan M, Gelfand MS. Phospho-islands and the evolution of phosphorylated amino acids in mammals. PeerJ 2020; 8:e10436. [PMID: 33344082 PMCID: PMC7718798 DOI: 10.7717/peerj.10436] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2020] [Accepted: 11/06/2020] [Indexed: 01/23/2023] Open
Abstract
Background Protein phosphorylation is the best studied post-translational modification strongly influencing protein function. Phosphorylated amino acids not only differ in physico-chemical properties from non-phosphorylated counterparts, but also exhibit different evolutionary patterns, tending to mutate to and originate from negatively charged amino acids (NCAs). The distribution of phosphosites along protein sequences is non-uniform, as phosphosites tend to cluster, forming so-called phospho-islands. Methods Here, we have developed a hidden Markov model-based procedure for the identification of phospho-islands and studied the properties of the obtained phosphorylation clusters. To check robustness of evolutionary analysis, we consider different models for the reconstructions of ancestral phosphorylation states. Results Clustered phosphosites differ from individual phosphosites in several functional and evolutionary aspects including underrepresentation of phosphotyrosines, higher conservation, more frequent mutations to NCAs. The spectrum of tissues, frequencies of specific phosphorylation contexts, and mutational patterns observed near clustered sites also are different.
Collapse
|
23
|
Rozenwald MB, Galitsyna AA, Sapunov GV, Khrameeva EE, Gelfand MS. A machine learning framework for the prediction of chromatin folding in Drosophila using epigenetic features. PeerJ Comput Sci 2020; 6:e307. [PMID: 33816958 PMCID: PMC7924456 DOI: 10.7717/peerj-cs.307] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2020] [Accepted: 09/30/2020] [Indexed: 05/03/2023]
Abstract
Technological advances have lead to the creation of large epigenetic datasets, including information about DNA binding proteins and DNA spatial structure. Hi-C experiments have revealed that chromosomes are subdivided into sets of self-interacting domains called Topologically Associating Domains (TADs). TADs are involved in the regulation of gene expression activity, but the mechanisms of their formation are not yet fully understood. Here, we focus on machine learning methods to characterize DNA folding patterns in Drosophila based on chromatin marks across three cell lines. We present linear regression models with four types of regularization, gradient boosting, and recurrent neural networks (RNN) as tools to study chromatin folding characteristics associated with TADs given epigenetic chromatin immunoprecipitation data. The bidirectional long short-term memory RNN architecture produced the best prediction scores and identified biologically relevant features. Distribution of protein Chriz (Chromator) and histone modification H3K4me3 were selected as the most informative features for the prediction of TADs characteristics. This approach may be adapted to any similar biological dataset of chromatin features across various cell lines and species. The code for the implemented pipeline, Hi-ChiP-ML, is publicly available: https://github.com/MichalRozenwald/Hi-ChIP-ML.
Collapse
|
24
|
Moldovan M, Chervontseva Z, Bazykin G, Gelfand MS. Adaptive evolution at mRNA editing sites in soft-bodied cephalopods. PeerJ 2020; 8:e10456. [PMID: 33312772 PMCID: PMC7703385 DOI: 10.7717/peerj.10456] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2020] [Accepted: 11/09/2020] [Indexed: 12/11/2022] Open
Abstract
Background The bulk of variability in mRNA sequence arises due to mutation—change in DNA sequence which is heritable if it occurs in the germline. However, variation in mRNA can also be achieved by post-transcriptional modification including mRNA editing, changes in mRNA nucleotide sequence that mimic the effect of mutations. Such modifications are not inherited directly; however, as the processes affecting them are encoded in the genome, they have a heritable component, and therefore can be shaped by selection. In soft-bodied cephalopods, adenine-to-inosine RNA editing is very frequent, and much of it occurs at nonsynonymous sites, affecting the sequence of the encoded protein. Methods We study selection regimes at coleoid A-to-I editing sites, estimate the prevalence of positive selection, and analyze interdependencies between the editing level and contextual characteristics of editing site. Results Here, we show that mRNA editing of individual nonsynonymous sites in cephalopods originates in evolution through substitutions at regions adjacent to these sites. As such substitutions mimic the effect of the substitution at the edited site itself, we hypothesize that they are favored by selection if the inosine is selectively advantageous to adenine at the edited position. Consistent with this hypothesis, we show that edited adenines are more frequently substituted with guanine, an informational analog of inosine, in the course of evolution than their unedited counterparts, and for heavily edited adenines, these transitions are favored by positive selection. Our study shows that coleoid editing sites may enhance adaptation, which, together with recent observations on Drosophila and human editing sites, points at a general role of RNA editing in the molecular evolution of metazoans.
Collapse
|
25
|
Osterman IA, Chervontseva ZS, Evfratov SA, Sorokina AV, Rodin VA, Rubtsova MP, Komarova ES, Zatsepin TS, Kabilov MR, Bogdanov AA, Gelfand MS, Dontsova OA, Sergiev PV. Translation at first sight: the influence of leading codons. Nucleic Acids Res 2020; 48:6931-6942. [PMID: 32427319 PMCID: PMC7337518 DOI: 10.1093/nar/gkaa430] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2020] [Revised: 05/07/2020] [Accepted: 05/14/2020] [Indexed: 01/31/2023] Open
Abstract
First triplets of mRNA coding region affect the yield of translation. We have applied the flowseq method to analyze >30 000 variants of the codons 2-11 of the fluorescent protein reporter to identify factors affecting the protein synthesis. While the negative influence of mRNA secondary structure on translation has been confirmed, a positive role of rare codons at the beginning of a coding sequence for gene expression has not been observed. The identity of triplets proximal to the start codon contributes more to the protein yield then more distant ones. Additional in-frame start codons enhance translation, while Shine-Dalgarno-like motifs downstream the initiation codon are inhibitory. The metabolic cost of amino acids affects the yield of protein in the poor medium. The most efficient translation was observed for variants with features resembling those of native Escherichia coli genes.
Collapse
|