51
|
Bondarenko VS, Gelfand MS. Evolution of the Exon-Intron Structure in Ciliate Genomes. PLoS One 2016; 11:e0161476. [PMID: 27603699 PMCID: PMC5014332 DOI: 10.1371/journal.pone.0161476] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2016] [Accepted: 08/06/2016] [Indexed: 12/27/2022] Open
Abstract
A typical eukaryotic gene is comprised of alternating stretches of regions, exons and introns, retained in and spliced out a mature mRNA, respectively. Although the length of introns may vary substantially among organisms, a large fraction of genes contains short introns in many species. Notably, some Ciliates (Paramecium and Nyctotherus) possess only ultra-short introns, around 25 bp long. In Paramecium, ultra-short introns with length divisible by three (3n) are under strong evolutionary pressure and have a high frequency of in-frame stop codons, which, in the case of intron retention, cause premature termination of mRNA translation and consequent degradation of the mis-spliced mRNA by the nonsense-mediated decay mechanism. Here, we analyzed introns in five genera of Ciliates, Paramecium, Tetrahymena, Ichthyophthirius, Oxytricha, and Stylonychia. Introns can be classified into two length classes in Tetrahymena and Ichthyophthirius (with means 48 bp, 69 bp, and 55 bp, 64 bp, respectively), but, surprisingly, comprise three distinct length classes in Oxytricha and Stylonychia (with means 33–35 bp, 47–51 bp, and 78–80 bp). In most ranges of the intron lengths, 3n introns are underrepresented and have a high frequency of in-frame stop codons in all studied species. Introns of Paramecium, Tetrahymena, and Ichthyophthirius are preferentially located at the 5' and 3' ends of genes, whereas introns of Oxytricha and Stylonychia are strongly skewed towards the 5' end. Analysis of evolutionary conservation shows that, in each studied genome, a significant fraction of intron positions is conserved between the orthologs, but intron lengths are not correlated between the species. In summary, our study provides a detailed characterization of introns in several genera of Ciliates and highlights some of their distinctive properties, which, together, indicate that splicing spellchecking is a universal and evolutionarily conserved process in the biogenesis of short introns in various representatives of Ciliates.
Collapse
|
52
|
Tsoy OV, Ravcheev DA, Čuklina J, Gelfand MS. Nitrogen Fixation and Molecular Oxygen: Comparative Genomic Reconstruction of Transcription Regulation in Alphaproteobacteria. Front Microbiol 2016; 7:1343. [PMID: 27617010 PMCID: PMC4999443 DOI: 10.3389/fmicb.2016.01343] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2016] [Accepted: 08/15/2016] [Indexed: 11/13/2022] Open
Abstract
Biological nitrogen fixation plays a crucial role in the nitrogen cycle. An ability to fix atmospheric nitrogen, reducing it to ammonium, was described for multiple species of Bacteria and Archaea. The transcriptional regulatory network for nitrogen fixation was extensively studied in several representatives of the class Alphaproteobacteria. This regulatory network includes the activator of nitrogen fixation NifA, working in tandem with the alternative sigma-factor RpoN as well as oxygen-responsive regulatory systems, one-component regulators FnrN/FixK and two-component system FixLJ. Here we used a comparative genomics approach for in silico study of the transcriptional regulatory network in 50 genomes of Alphaproteobacteria. We extended the known regulons and proposed the scenario for the evolution of the nitrogen fixation transcriptional network. The reconstructed network substantially expands the existing knowledge of transcriptional regulation in nitrogen-fixing microorganisms and can be used for genetic experiments, metabolic reconstruction, and evolutionary analysis.
Collapse
|
53
|
Čuklina J, Hahn J, Imakaev M, Omasits U, Förstner KU, Ljubimov N, Goebel M, Pessi G, Fischer HM, Ahrens CH, Gelfand MS, Evguenieva-Hackenberg E. Genome-wide transcription start site mapping of Bradyrhizobium japonicum grown free-living or in symbiosis - a rich resource to identify new transcripts, proteins and to study gene regulation. BMC Genomics 2016; 17:302. [PMID: 27107716 PMCID: PMC4842269 DOI: 10.1186/s12864-016-2602-9] [Citation(s) in RCA: 46] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2015] [Accepted: 03/25/2016] [Indexed: 11/17/2022] Open
Abstract
BACKGROUND Differential RNA-sequencing (dRNA-seq) is indispensable for determination of primary transcriptomes. However, using dRNA-seq data to map transcriptional start sites (TSSs) and promoters genome-wide is a bioinformatics challenge. We performed dRNA-seq of Bradyrhizobium japonicum USDA 110, the nitrogen-fixing symbiont of soybean, and developed algorithms to map TSSs and promoters. RESULTS A specialized machine learning procedure for TSS recognition allowed us to map 15,923 TSSs: 14,360 in free-living bacteria, 4329 in symbiosis with soybean and 2766 in both conditions. Further, we provide proteomic evidence for 4090 proteins, among them 107 proteins corresponding to new genes and 178 proteins with N-termini different from the existing annotation (72 and 109 of them with TSS support, respectively). Guided by proteomics evidence, previously identified TSSs and TSSs experimentally validated here, we assign a score threshold to flag 14 % of the mapped TSSs as a class of lower confidence. However, this class of lower confidence contains valid TSSs of low-abundant transcripts. Moreover, we developed a de novo algorithm to identify promoter motifs upstream of mapped TSSs, which is publicly available, and found motifs mainly used in symbiosis (similar to RpoN-dependent promoters) or under both conditions (similar to RpoD-dependent promoters). Mapped TSSs and putative promoters, proteomic evidence and updated gene annotation were combined into an annotation file. CONCLUSIONS The genome-wide TSS and promoter maps along with the extended genome annotation of B. japonicum represent a valuable resource for future systems biology studies and for detailed analyses of individual non-coding transcripts and ORFs. Our data will also provide new insights into bacterial gene regulation during the agriculturally important symbiosis between rhizobia and legumes.
Collapse
|
54
|
Zhang B, Han D, Korostelev Y, Yan Z, Shao N, Khrameeva E, Velichkovsky BM, Chen YPP, Gelfand MS, Khaitovich P. Changes in snoRNA and snRNA Abundance in the Human, Chimpanzee, Macaque, and Mouse Brain. Genome Biol Evol 2016; 8:840-50. [PMID: 26926764 PMCID: PMC4824147 DOI: 10.1093/gbe/evw038] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Small nuclear and nucleolar RNAs (snRNAs and snoRNAs) are known to be functionally and evolutionarily conserved elements of transcript processing machinery. Here, we investigated the expression evolution of snRNAs and snoRNAs by measuring their abundance in the frontal cortex of humans, chimpanzees, rhesus monkeys, and mice. Although snRNA expression is largely conserved, 44% of the 185 measured snoRNA and 40% of the 134 snoRNA families showed significant expression divergence among species. The snRNA and snoRNA expression divergence included drastic changes unique to humans: A 10-fold elevated expression of U1 snRNA and a 1,000-fold drop in expression of SNORA29. The decreased expression of SNORA29 might be due to two mutations that affect secondary structure stability. Using in situ hybridization, we further localized SNORA29 expression to nucleolar regions of neuronal cells. Our study presents the first observation of snoRNA abundance changes specific to the human lineage and suggests a possible mechanism underlying these changes.
Collapse
|
55
|
Kalinina AS, Suvorikova AL, Spokoiny VG, Gelfand MS. Detection of homologous recombination in closely related strains. J Bioinform Comput Biol 2016; 14:1641001. [PMID: 26952964 DOI: 10.1142/s0219720016410018] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Detection of recombination events in a bacterial genome is both important from the evolutionary point of view, and of practical interest. Indeed, homologous recombination (HR) plays a major role in the exchange of antigenic determinants between strains. There exist statistical methods to detect recently recombined segments in whole-genome sequences that use a high local density of substitutions as a signal of HR events with a source outside considered strains. However, it is difficult to detect the HR events within a set of strains, which represent whole species diversity, due to a low number of substitutions in recombined segments and high level of diversity of strains. Here, we analyzed HR in 20 Escherichia coli (E. coli) strains to define what fraction of segments with a high substitution rate were introduced in a genome by HR. For detection of HR, we used the segmentation, performed by the adaptive weights smoothing (AWS) algorithm. It detects sharp changes in the structure of observed data analyzing only qualitative structural information. We validated the approach on simulated data, applied it to the analysis of E. coli strains, and determined the recombination rates between phylogroups.
Collapse
|
56
|
Flegontov P, Changmai P, Zidkova A, Logacheva MD, Altınışık NE, Flegontova O, Gelfand MS, Gerasimov ES, Khrameeva EE, Konovalova OP, Neretina T, Nikolsky YV, Starostin G, Stepanova VV, Travinsky IV, Tříska M, Tříska P, Tatarinova TV. Genomic study of the Ket: a Paleo-Eskimo-related ethnic group with significant ancient North Eurasian ancestry. Sci Rep 2016; 6:20768. [PMID: 26865217 PMCID: PMC4750364 DOI: 10.1038/srep20768] [Citation(s) in RCA: 42] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2015] [Accepted: 01/07/2016] [Indexed: 01/11/2023] Open
Abstract
The Kets, an ethnic group in the Yenisei River basin, Russia, are considered the last nomadic hunter-gatherers of Siberia, and Ket language has no transparent affiliation with any language family. We investigated connections between the Kets and Siberian and North American populations, with emphasis on the Mal'ta and Paleo-Eskimo ancient genomes, using original data from 46 unrelated samples of Kets and 42 samples of their neighboring ethnic groups (Uralic-speaking Nganasans, Enets, and Selkups). We genotyped over 130,000 autosomal SNPs, identified mitochondrial and Y-chromosomal haplogroups, and performed high-coverage genome sequencing of two Ket individuals. We established that Nganasans, Kets, Selkups, and Yukaghirs form a cluster of populations most closely related to Paleo-Eskimos in Siberia (not considering indigenous populations of Chukotka and Kamchatka). Kets are closely related to modern Selkups and to some Bronze and Iron Age populations of the Altai region, with all these groups sharing a high degree of Mal'ta ancestry. Implications of these findings for the linguistic hypothesis uniting Ket and Na-Dene languages into a language macrofamily are discussed.
Collapse
|
57
|
Khrameeva EE, Fudenberg G, Gelfand MS, Mirny LA. History of chromosome rearrangements reflects the spatial organization of yeast chromosomes. J Bioinform Comput Biol 2016; 14:1641002. [PMID: 27021249 DOI: 10.1142/s021972001641002x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Three-dimensional (3D) organization of genomes affects critical cellular processes such as transcription, replication, and deoxyribo nucleic acid (DNA) repair. While previous studies have investigated the natural role, the 3D organization plays in limiting a possible set of genomic rearrangements following DNA repair, the influence of specific organizational principles on this process, particularly over longer evolutionary time scales, remains relatively unexplored. In budding yeast S.cerevisiae, chromosomes are organized into a Rabl-like configuration, with clustered centromeres and telomeres tethered to the nuclear periphery. Hi-C data for S.cerevisiae show that a consequence of this Rabl-like organization is that regions equally distant from centromeres are more frequently in contact with each other, between arms of both the same and different chromosomes. Here, we detect rearrangement events in Saccharomyces species using an automatic approach, and observe increased rearrangement frequency between regions with higher contact frequencies. Together, our results underscore how specific principles of 3D chromosomal organization can influence evolutionary events.
Collapse
|
58
|
Ulianov SV, Khrameeva EE, Gavrilov AA, Flyamer IM, Kos P, Mikhaleva EA, Penin AA, Logacheva MD, Imakaev MV, Chertovich A, Gelfand MS, Shevelyov YY, Razin SV. Active chromatin and transcription play a key role in chromosome partitioning into topologically associating domains. Genome Res 2015; 26:70-84. [PMID: 26518482 PMCID: PMC4691752 DOI: 10.1101/gr.196006.115] [Citation(s) in RCA: 243] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2015] [Accepted: 10/26/2015] [Indexed: 01/06/2023]
Abstract
Recent advances enabled by the Hi-C technique have unraveled many principles of chromosomal folding that were subsequently linked to disease and gene regulation. In particular, Hi-C revealed that chromosomes of animals are organized into topologically associating domains (TADs), evolutionary conserved compact chromatin domains that influence gene expression. Mechanisms that underlie partitioning of the genome into TADs remain poorly understood. To explore principles of TAD folding in Drosophila melanogaster, we performed Hi-C and poly(A)+ RNA-seq in four cell lines of various origins (S2, Kc167, DmBG3-c2, and OSC). Contrary to previous studies, we find that regions between TADs (i.e., the inter-TADs and TAD boundaries) in Drosophila are only weakly enriched with the insulator protein dCTCF, while another insulator protein Su(Hw) is preferentially present within TADs. However, Drosophila inter-TADs harbor active chromatin and constitutively transcribed (housekeeping) genes. Accordingly, we find that binding of insulator proteins dCTCF and Su(Hw) predicts TAD boundaries much worse than active chromatin marks do. Interestingly, inter-TADs correspond to decompacted inter-bands of polytene chromosomes, whereas TADs mostly correspond to densely packed bands. Collectively, our results suggest that TADs are condensed chromatin domains depleted in active chromatin marks, separated by regions of active chromatin. We propose the mechanism of TAD self-assembly based on the ability of nucleosomes from inactive chromatin to aggregate, and lack of this ability in acetylated nucleosomal arrays. Finally, we test this hypothesis by polymer simulations and find that TAD partitioning may be explained by different modes of inter-nucleosomal interactions for active and inactive chromatin.
Collapse
|
59
|
Suvorova IA, Korostelev YD, Gelfand MS. GntR Family of Bacterial Transcription Factors and Their DNA Binding Motifs: Structure, Positioning and Co-Evolution. PLoS One 2015; 10:e0132618. [PMID: 26151451 PMCID: PMC4494728 DOI: 10.1371/journal.pone.0132618] [Citation(s) in RCA: 60] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2015] [Accepted: 06/16/2015] [Indexed: 12/03/2022] Open
Abstract
The GntR family of transcription factors (TFs) is a large group of proteins present in diverse bacteria and regulating various biological processes. Here we use the comparative genomics approach to reconstruct regulons and identify binding motifs of regulators from three subfamilies of the GntR family, FadR, HutC, and YtrA. Using these data, we attempt to predict DNA-protein contacts by analyzing correlations between binding motifs in DNA and amino acid sequences of TFs. We identify pairs of positions with high correlation between amino acids and nucleotides for FadR, HutC, and YtrA subfamilies and show that the most predicted DNA-protein interactions are quite similar in all subfamilies and conform well to the experimentally identified contacts formed by FadR from E. coli and AraR from B. subtilis. The most frequent predicted contacts in the analyzed subfamilies are Arg-G, Asn-A, Asp-C. We also analyze the divergon structure and preferred site positions relative to regulated genes in the FadR and HutC subfamilies. A single site in a divergon usually regulates both operons and is approximately in the middle of the intergenic area. Double sites are either involved in the co-operative regulation of both operons and then are in the center of the intergenic area, or each site in the pair independently regulates its own operon and tends to be near it. We also identify additional candidate TF-binding boxes near palindromic binding sites of TFs from the FadR, HutC, and YtrA subfamilies, which may play role in the binding of additional TF-subunits.
Collapse
|
60
|
Kondrashov FA, Kondrashov AS, Gelfand MS. Dynasty Foundation: Russian science loses to politics. Nature 2015; 522:419. [PMID: 26108844 DOI: 10.1038/522419a] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
61
|
Garushyants SK, Kazanov MD, Gelfand MS. Horizontal gene transfer and genome evolution in Methanosarcina. BMC Evol Biol 2015; 15:102. [PMID: 26044078 PMCID: PMC4455057 DOI: 10.1186/s12862-015-0393-2] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2014] [Accepted: 05/29/2015] [Indexed: 12/29/2022] Open
Abstract
Background Genomes of Methanosarcina spp. are among the largest archaeal genomes. One suggested reason for that is massive horizontal gene transfer (HGT) from bacteria. Genes of bacterial origin may be involved in the central metabolism and solute transport, in particular sugar synthesis, sulfur metabolism, phosphate metabolism, DNA repair, transport of small molecules etc. Horizontally transferred (HT) genes are considered to play the key role in the ability of Methanosarcina spp. to inhabit diverse environments. At the moment, genomes of three Methanosarcina spp. have been sequenced, and while these genomes vary in length and number of protein-coding genes, they all have been shown to accumulate HT genes. However, previous estimates had been made when fewer archaeal genomes were known. Moreover, several Methanosarcinaceae genomes from other genera have been sequenced recently. Here, we revise the census of genes of bacterial origin in Methanosarcinaceae. Results About 5 % of Methanosarcina genes have been shown to be horizontally transferred from various bacterial groups to the last common ancestor either of Methanosarcinaceae, or Methanosarcina, or later in the evolution. Simulation of the composition of the NCBI protein non-redundant database for different years demonstrates that the estimates of the HGT rate have decreased drastically since 2002, the year of publication of the first Methanosarcina genome. The phylogenetic distribution of HT gene donors is non-uniform. Most HT genes were transferred from Firmicutes and Proteobacteria, while no HGT events from Actinobacteria to the common ancestor of Methanosarcinaceae were found. About 50 % of HT genes are involved in metabolism. Horizontal transfer of transcription factors is not common, while 46 % of horizontally transferred genes have demonstrated differential expression in a variety of conditions. HGT of complete operons is relatively infrequent and half of HT genes do not belong to operons. Conclusions While genes of bacterial origin are still more frequent in Methanosarcinaceae than in other Archaea, most HGT events described earlier as Methanosarcina-specific seem to have occurred before the divergence of Methanosarcinaceae. Genes horizontally transferred from bacteria to archaea neither tend to be transferred with their regulators, nor in long operons. Electronic supplementary material The online version of this article (doi:10.1186/s12862-015-0393-2) contains supplementary material, which is available to authorized users.
Collapse
|
62
|
Khrameeva EE, Ulyanov SV, Gavrilov AA, Shevelyov YY, Gelfand MS, Razin SV. 20 Active chromatin regions are sufficient to define borders of topologically associated domains in D. melanogasterinterphase chromosomes. J Biomol Struct Dyn 2015. [DOI: 10.1080/07391102.2015.1032560] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
63
|
Kurmangaliyev YZ, Favorov AV, Osman NM, Lehmann KV, Campo D, Salomon MP, Tower J, Gelfand MS, Nuzhdin SV. Natural variation of gene models in Drosophila melanogaster. BMC Genomics 2015; 16:198. [PMID: 25888292 PMCID: PMC4373058 DOI: 10.1186/s12864-015-1415-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2014] [Accepted: 02/28/2015] [Indexed: 09/03/2023] Open
Abstract
Background Variation within splicing regulatory sequences often leads to differences in gene models among individuals within a species. Two alleles of the same gene may express transcripts with different exon/intron structures and consequently produce functionally different proteins. Matching genomic and transcriptomic data allows us to identify putative regulatory variants associated with changes in splicing patterns. Results Here we analyzed natural variation of splicing patterns in the transcriptomes of 81 natural strains of Drosophila melanogaster with known genotypes. We identified dozens of genotype-specific splicing patterns associated with putative cis-splicing quantitative trait loci (sQTL). The majority of changes can be explained by mutations in splice sites. Allelic-imbalance in splicing patterns confirmed that the majority are regulated mainly by cis-genetic effects. Remarkably, allele-specific splicing changes often lead to qualitative changes in gene models, yielding many isoforms not previously annotated. The observed alterations are typically outside protein-coding regions or affect only very short protein segments. Conclusions Overall, the sets of gene models appear to be flexible within D. melanogaster populations. The observed variation in splicing patterns are predicted to have limited effects on the encoded protein sequences. To our knowledge, this is the first sQTL mapping study in Drosophila. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-1415-6) contains supplementary material, which is available to authorized users.
Collapse
|
64
|
Leyn SA, Suvorova IA, Kholina TD, Sherstneva SS, Novichkov PS, Gelfand MS, Rodionov DA. Comparative genomics of transcriptional regulation of methionine metabolism in Proteobacteria. PLoS One 2014; 9:e113714. [PMID: 25411846 PMCID: PMC4239095 DOI: 10.1371/journal.pone.0113714] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2014] [Accepted: 10/28/2014] [Indexed: 12/20/2022] Open
Abstract
Methionine metabolism and uptake genes in Proteobacteria are controlled by a variety of RNA and DNA regulatory systems. We have applied comparative genomics to reconstruct regulons for three known transcription factors, MetJ, MetR, and SahR, and three known riboswitch motifs, SAH, SAM-SAH, and SAM_alpha, in ∼ 200 genomes from 22 taxonomic groups of Proteobacteria. We also identified two novel regulons: a SahR-like transcription factor SamR controlling various methionine biosynthesis genes in the Xanthomonadales group, and a potential RNA regulatory element with terminator-antiterminator mechanism controlling the metX or metZ genes in beta-proteobacteria. For each analyzed regulator we identified the core, taxon-specific and genome-specific regulon members. By analyzing the distribution of these regulators in bacterial genomes and by comparing their regulon contents we elucidated possible evolutionary scenarios for the regulation of the methionine metabolism genes in Proteobacteria.
Collapse
|
65
|
Gelfand MS, Cleveland KO. Successful treatment with doripenem of ventriculitis due to Achromobacter xylosoxidans. QJM 2014; 107:923-5. [PMID: 22411874 DOI: 10.1093/qjmed/hcs048] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
66
|
Ravcheev DA, Khoroshkin MS, Laikova ON, Tsoy OV, Sernova NV, Petrova SA, Rakhmaninova AB, Novichkov PS, Gelfand MS, Rodionov DA. Comparative genomics and evolution of regulons of the LacI-family transcription factors. Front Microbiol 2014; 5:294. [PMID: 24966856 PMCID: PMC4052901 DOI: 10.3389/fmicb.2014.00294] [Citation(s) in RCA: 52] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2014] [Accepted: 05/28/2014] [Indexed: 12/31/2022] Open
Abstract
DNA-binding transcription factors (TFs) are essential components of transcriptional regulatory networks in bacteria. LacI-family TFs (LacI-TFs) are broadly distributed among certain lineages of bacteria. The majority of characterized LacI-TFs sense sugar effectors and regulate carbohydrate utilization genes. The comparative genomics approaches enable in silico identification of TF-binding sites and regulon reconstruction. To study the function and evolution of LacI-TFs, we performed genomics-based reconstruction and comparative analysis of their regulons. For over 1300 LacI-TFs from over 270 bacterial genomes, we predicted their cognate DNA-binding motifs and identified target genes. Using the genome context and metabolic subsystem analyses of reconstructed regulons, we tentatively assigned functional roles and predicted candidate effectors for 78 and 67% of the analyzed LacI-TFs, respectively. Nearly 90% of the studied LacI-TFs are local regulators of sugar utilization pathways, whereas the remaining 125 global regulators control large and diverse sets of metabolic genes. The global LacI-TFs include the previously known regulators CcpA in Firmicutes, FruR in Enterobacteria, and PurR in Gammaproteobacteria, as well as the three novel regulators—GluR, GapR, and PckR—that are predicted to control the central carbohydrate metabolism in three lineages of Alphaproteobacteria. Phylogenetic analysis of regulators combined with the reconstructed regulons provides a model of evolutionary diversification of the LacI protein family. The obtained genomic collection of in silico reconstructed LacI-TF regulons in bacteria is available in the RegPrecise database (http://regprecise.lbl.gov). It provides a framework for future structural and functional classification of the LacI protein family and identification of molecular determinants of the DNA and ligand specificity. The inferred regulons can be also used for functional gene annotation and reconstruction of sugar catabolic networks in diverse bacterial lineages.
Collapse
|
67
|
Denisov SV, Bazykin GA, Sutormin R, Favorov AV, Mironov AA, Gelfand MS, Kondrashov AS. Weak negative and positive selection and the drift load at splice sites. Genome Biol Evol 2014; 6:1437-47. [PMID: 24966225 PMCID: PMC4079205 DOI: 10.1093/gbe/evu100] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/05/2014] [Indexed: 11/30/2022] Open
Abstract
Splice sites (SSs) are short sequences that are crucial for proper mRNA splicing in eukaryotic cells, and therefore can be expected to be shaped by strong selection. Nevertheless, in mammals and in other intron-rich organisms, many of the SSs often involve nonconsensus (Nc), rather than consensus (Cn), nucleotides, and beyond the two critical nucleotides, the SSs are not perfectly conserved between species. Here, we compare the SS sequences between primates, and between Drosophila fruit flies, to reveal the pattern of selection acting at SSs. Cn-to-Nc substitutions are less frequent, and Nc-to-Cn substitutions are more frequent, than neutrally expected, indicating, respectively, negative and positive selection. This selection is relatively weak (1 < |4Nes| < 4), and has a similar efficiency in primates and in Drosophila. Within some nucleotide positions, the positive selection in favor of Nc-to-Cn substitutions is weaker than the negative selection maintaining already established Cn nucleotides; this difference is due to site-specific negative selection favoring current Nc nucleotides. In general, however, the strength of negative selection protecting the Cn alleles is similar in magnitude to the strength of positive selection favoring replacement of Nc alleles, as expected under the simple nearly neutral turnover. In summary, although a fraction of the Nc nucleotides within SSs is maintained by selection, the abundance of deleterious nucleotides in this class suggests a substantial genome-wide drift load.
Collapse
|
68
|
Ian E, Malko DB, Sekurova ON, Bredholt H, Rückert C, Borisova ME, Albersmeier A, Kalinowski J, Gelfand MS, Zotchev SB. Genomics of sponge-associated Streptomyces spp. closely related to Streptomyces albus J1074: insights into marine adaptation and secondary metabolite biosynthesis potential. PLoS One 2014; 9:e96719. [PMID: 24819608 PMCID: PMC4018334 DOI: 10.1371/journal.pone.0096719] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2014] [Accepted: 04/10/2014] [Indexed: 11/23/2022] Open
Abstract
A total of 74 actinomycete isolates were cultivated from two marine sponges, Geodia barretti and Phakellia ventilabrum collected at the same spot at the bottom of the Trondheim fjord (Norway). Phylogenetic analyses of sponge-associated actinomycetes based on the 16S rRNA gene sequences demonstrated the presence of species belonging to the genera Streptomyces, Nocardiopsis, Rhodococcus, Pseudonocardia and Micromonospora. Most isolates required sea water for growth, suggesting them being adapted to the marine environment. Phylogenetic analysis of Streptomyces spp. revealed two isolates that originated from different sponges and had 99.7% identity in their 16S rRNA gene sequences, indicating that they represent very closely related strains. Sequencing, annotation, and analyses of the genomes of these Streptomyces isolates demonstrated that they are sister organisms closely related to terrestrial Streptomyces albus J1074. Unlike S. albus J1074, the two sponge streptomycetes grew and differentiated faster on the medium containing sea water. Comparative genomics revealed several genes presumably responsible for partial marine adaptation of these isolates. Genome mining targeted to secondary metabolite biosynthesis gene clusters identified several of those, which were not present in S. albus J1074, and likely to have been retained from a common ancestor, or acquired from other actinomycetes. Certain genes and gene clusters were shown to be differentially acquired or lost, supporting the hypothesis of divergent evolution of the two Streptomyces species in different sponge hosts.
Collapse
|
69
|
Rueda S, Fathima S, Knight CL, Yaqub M, Papageorghiou AT, Rahmatullah B, Foi A, Maggioni M, Pepe A, Tohka J, Stebbing RV, McManigle JE, Ciurte A, Bresson X, Cuadra MB, Sun C, Ponomarev GV, Gelfand MS, Kazanov MD, Wang CW, Chen HC, Peng CW, Hung CM, Noble JA. Evaluation and comparison of current fetal ultrasound image segmentation methods for biometric measurements: a grand challenge. IEEE TRANSACTIONS ON MEDICAL IMAGING 2014; 33:797-813. [PMID: 23934664 DOI: 10.1109/tmi.2013.2276943] [Citation(s) in RCA: 71] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]
Abstract
This paper presents the evaluation results of the methods submitted to Challenge US: Biometric Measurements from Fetal Ultrasound Images, a segmentation challenge held at the IEEE International Symposium on Biomedical Imaging 2012. The challenge was set to compare and evaluate current fetal ultrasound image segmentation methods. It consisted of automatically segmenting fetal anatomical structures to measure standard obstetric biometric parameters, from 2D fetal ultrasound images taken on fetuses at different gestational ages (21 weeks, 28 weeks, and 33 weeks) and with varying image quality to reflect data encountered in real clinical environments. Four independent sub-challenges were proposed, according to the objects of interest measured in clinical practice: abdomen, head, femur, and whole fetus. Five teams participated in the head sub-challenge and two teams in the femur sub-challenge, including one team who tackled both. Nobody attempted the abdomen and whole fetus sub-challenges. The challenge goals were two-fold and the participants were asked to submit the segmentation results as well as the measurements derived from the segmented objects. Extensive quantitative (region-based, distance-based, and Bland-Altman measurements) and qualitative evaluation was performed to compare the results from a representative selection of current methods submitted to the challenge. Several experts (three for the head sub-challenge and two for the femur sub-challenge), with different degrees of expertise, manually delineated the objects of interest to define the ground truth used within the evaluation framework. For the head sub-challenge, several groups produced results that could be potentially used in clinical settings, with comparable performance to manual delineations. The femur sub-challenge had inferior performance to the head sub-challenge due to the fact that it is a harder segmentation problem and that the techniques presented relied more on the femur's appearance.
Collapse
|
70
|
Gogleva AA, Gelfand MS, Artamonova II. Comparative analysis of CRISPR cassettes from the human gut metagenomic contigs. BMC Genomics 2014; 15:202. [PMID: 24628983 PMCID: PMC4004331 DOI: 10.1186/1471-2164-15-202] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2013] [Accepted: 03/04/2014] [Indexed: 08/30/2023] Open
Abstract
Background CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) is a prokaryotic adaptive defence system that provides resistance against alien replicons such as viruses and plasmids. Spacers in a CRISPR cassette confer immunity against viruses and plasmids containing regions complementary to the spacers and hence they retain a footprint of interactions between prokaryotes and their viruses in individual strains and ecosystems. The human gut is a rich habitat populated by numerous microorganisms, but a large fraction of these are unculturable and little is known about them in general and their CRISPR systems in particular. Results We used human gut metagenomic data from three open projects in order to characterize the composition and dynamics of CRISPR cassettes in the human-associated microbiota. Applying available CRISPR-identification algorithms and a previously designed filtering procedure to the assembled human gut metagenomic contigs, we found 388 CRISPR cassettes, 373 of which had repeats not observed previously in complete genomes or other datasets. Only 171 of 3,545 identified spacers were coupled with protospacers from the human gut metagenomic contigs. The number of matches to GenBank sequences was negligible, providing protospacers for 26 spacers. Reconstruction of CRISPR cassettes allowed us to track the dynamics of spacer content. In agreement with other published observations we show that spacers shared by different cassettes (and hence likely older ones) tend to the trailer ends, whereas spacers with matches in the metagenomes are distributed unevenly across cassettes, demonstrating a preference to form clusters closer to the active end of a CRISPR cassette, adjacent to the leader, and hence suggesting dynamical interactions between prokaryotes and viruses in the human gut. Remarkably, spacers match protospacers in the metagenome of the same individual with frequency comparable to a random control, but may match protospacers from metagenomes of other individuals. Conclusions The analysis of assembled contigs is complementary to the approach based on the analysis of original reads and hence provides additional data about composition and evolution of CRISPR cassettes, revealing the dynamics of CRISPR-phage interactions in metagenomes. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-202) contains supplementary material, which is available to authorized users.
Collapse
|
71
|
Rossbach O, Hung LH, Khrameeva E, Schreiner S, König J, Curk T, Zupan B, Ule J, Gelfand MS, Bindereif A. Crosslinking-immunoprecipitation (iCLIP) analysis reveals global regulatory roles of hnRNP L. RNA Biol 2014; 11:146-55. [PMID: 24526010 DOI: 10.4161/rna.27991] [Citation(s) in RCA: 73] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Heterogeneous nuclear ribonucleoprotein L (hnRNP L) is a multifunctional RNA-binding protein that is involved in many different processes, such as regulation of transcription, translation, and RNA stability. We have previously characterized hnRNP L as a global regulator of alternative splicing, binding to CA-repeat, and CA-rich RNA elements. Interestingly, hnRNP L can both activate and repress splicing of alternative exons, but the precise mechanism of hnRNP L-mediated splicing regulation remained unclear. To analyze activities of hnRNP L on a genome-wide level, we performed individual-nucleotide resolution crosslinking-immunoprecipitation in combination with deep-sequencing (iCLIP-Seq). Sequence analysis of the iCLIP crosslink sites showed significant enrichment of C/A motifs, which perfectly agrees with the in vitro binding consensus obtained earlier by a SELEX approach, indicating that in vivo hnRNP L binding targets are mainly determined by the RNA-binding activity of the protein. Genome-wide mapping of hnRNP L binding revealed that the protein preferably binds to introns and 3' UTR. Additionally, position-dependent splicing regulation by hnRNP L was demonstrated: The protein represses splicing when bound to intronic regions upstream of alternative exons, and in contrast, activates splicing when bound to the downstream intron. These findings shed light on the longstanding question of differential hnRNP L-mediated splicing regulation. Finally, regarding 3' UTR binding, hnRNP L binding preferentially overlaps with predicted microRNA target sites, indicating global competition between hnRNP L and microRNA binding. Translational regulation by hnRNP L was validated for a subset of predicted target 3'UTRs.
Collapse
|
72
|
Belushkin AA, Vinogradov DV, Gelfand MS, Osterman AL, Cieplak P, Kazanov MD. Sequence-derived structural features driving proteolytic processing. Proteomics 2013; 14:42-50. [PMID: 24227478 DOI: 10.1002/pmic.201300416] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2013] [Revised: 10/22/2013] [Accepted: 10/28/2013] [Indexed: 12/11/2022]
Abstract
Proteolytic signaling, or regulated proteolysis, is an essential part of many important pathways such as Notch, Wnt, and Hedgehog. How the structure of the cleaved substrate regions influences the efficacy of proteolytic processing remains underexplored. Here, we analyzed the relative importance in proteolysis of various structural features derived from substrate sequences using a dataset of more than 5000 experimentally verified proteolytic events captured in CutDB. Accessibility to the solvent was recognized as an essential property of a proteolytically processed polypeptide chain. Proteolytic events were found nearly uniformly distributed among three types of secondary structure, although with some enrichment in loops. Cleavages in α-helices were found to be relatively abundant in regions apparently prone to unfolding, while cleavages in β-structures tended to be located at the periphery of β-sheets. Application of the same statistical procedures to proteolytic events divided into separate sets according to the catalytic classes of proteases proved consistency of the results and confirmed that the structural mechanisms of proteolysis are universal. The estimated prediction power of sequence-derived structural features, which turned out to be sufficiently high, presents a rationale for their use in bioinformatic prediction of proteolytic events.
Collapse
|
73
|
Rösel-Hillgärtner TD, Hung LH, Khrameeva E, Le Querrec P, Gelfand MS, Bindereif A. A novel intra-U1 snRNP cross-regulation mechanism: alternative splicing switch links U1C and U1-70K expression. PLoS Genet 2013; 9:e1003856. [PMID: 24146627 PMCID: PMC3798272 DOI: 10.1371/journal.pgen.1003856] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2013] [Accepted: 08/21/2013] [Indexed: 11/18/2022] Open
Abstract
The U1 small nuclear ribonucleoprotein (snRNP)-specific U1C protein participates in 5′ splice site recognition and regulation of pre-mRNA splicing. Based on an RNA-Seq analysis in HeLa cells after U1C knockdown, we found a conserved, intra-U1 snRNP cross-regulation that links U1C and U1-70K expression through alternative splicing and U1 snRNP assembly. To investigate the underlying regulatory mechanism, we combined mutational minigene analysis, in vivo splice-site blocking by antisense morpholinos, and in vitro binding experiments. Alternative splicing of U1-70K pre-mRNA creates the normal (exons 7–8) and a non-productive mRNA isoform, whose balance is determined by U1C protein levels. The non-productive isoform is generated through a U1C-dependent alternative 3′ splice site, which requires an adjacent cluster of regulatory 5′ splice sites and binding of intact U1 snRNPs. As a result of nonsense-mediated decay (NMD) of the non-productive isoform, U1-70K mRNA and protein levels are down-regulated, and U1C incorporation into the U1 snRNP is impaired. U1-70K/U1C-deficient particles are assembled, shifting the alternative splicing balance back towards productive U1-70K splicing, and restoring assembly of intact U1 snRNPs. Taken together, we established a novel feedback regulation that controls U1-70K/U1C homeostasis and ensures correct U1 snRNP assembly and function. The accurate removal of intervening sequences (introns) from precursor messenger RNAs (pre-mRNAs) represents an essential step in the expression of most eukaryotic protein-coding genes. Alternative splicing can create from a single primary transcript various mature mRNAs with diverse, sometimes even antagonistic, biological functions. Many human diseases are based on alternative-splicing defects, and most interestingly, certain defects are caused by mutations in general splicing factors that participate in each splicing event. To address the question of how a general splicing factor can regulate alternative splicing events, here we investigated the regulatory role of the U1C protein, a specific component of the U1 small nuclear ribonucleoprotein (snRNP) and important in initial 5′ splice site recognition. Our RNA-Seq analysis demonstrated that U1C affects more than 300 cases of alternative splicing in the human system. One U1C target, U1-70K, appeared to be particularly interesting, because both protein products are components of the U1 snRNP and functionally depend on each other. Analyzing the mechanistic basis of this intra-U1 snRNP cross-regulation, we discovered a U1C-dependent alternative splicing switch in the U1-70K pre-mRNA that regulates U1-70K expression. In sum, this feedback loop controls and links U1C and U1-70K homeostasis to guarantee correct U1 snRNP assembly and function.
Collapse
|
74
|
Saujet L, Pereira FC, Serrano M, Soutourina O, Monot M, Shelyakin PV, Gelfand MS, Dupuy B, Henriques AO, Martin-Verstraete I. Genome-wide analysis of cell type-specific gene transcription during spore formation in Clostridium difficile. PLoS Genet 2013; 9:e1003756. [PMID: 24098137 PMCID: PMC3789822 DOI: 10.1371/journal.pgen.1003756] [Citation(s) in RCA: 108] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2013] [Accepted: 07/12/2013] [Indexed: 01/05/2023] Open
Abstract
Clostridium difficile, a Gram positive, anaerobic, spore-forming bacterium is an emergent pathogen and the most common cause of nosocomial diarrhea. Although transmission of C. difficile is mediated by contamination of the gut by spores, the regulatory cascade controlling spore formation remains poorly characterized. During Bacillus subtilis sporulation, a cascade of four sigma factors, σ(F) and σ(G) in the forespore and σ(E) and σ(K) in the mother cell governs compartment-specific gene expression. In this work, we combined genome wide transcriptional analyses and promoter mapping to define the C. difficile σ(F), σ(E), σ(G) and σ(K) regulons. We identified about 225 genes under the control of these sigma factors: 25 in the σ(F) regulon, 97 σ(E)-dependent genes, 50 σ(G)-governed genes and 56 genes under σ(K) control. A significant fraction of genes in each regulon is of unknown function but new candidates for spore coat proteins could be proposed as being synthesized under σ(E) or σ(K) control and detected in a previously published spore proteome. SpoIIID of C. difficile also plays a pivotal role in the mother cell line of expression repressing the transcription of many members of the σ(E) regulon and activating sigK expression. Global analysis of developmental gene expression under the control of these sigma factors revealed deviations from the B. subtilis model regarding the communication between mother cell and forespore in C. difficile. We showed that the expression of the σ(E) regulon in the mother cell was not strictly under the control of σ(F) despite the fact that the forespore product SpoIIR was required for the processing of pro-σ(E). In addition, the σ(K) regulon was not controlled by σ(G) in C. difficile in agreement with the lack of pro-σ(K) processing. This work is one key step to obtain new insights about the diversity and evolution of the sporulation process among Firmicutes.
Collapse
|
75
|
Gorbunov KY, Laikova ON, Rodionov DA, Gelfand MS, Lyubetsky VA. Evolution of regulatory motifs of bacterial transcription factors. In Silico Biol 2013; 10:163-83. [PMID: 22430290 DOI: 10.3233/isb-2010-0425] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Unlike evolution of genes and proteins, evolution of regulatory systems is a relatively new area of research. In particular, little systematic study has been done on evolution of DNA binding motifs in transcription factor families. We suggest an algorithm that reconstructs the most parsimonious scenario for changes in DNA binding motifs along an evolutionary tree of transcription factor binding sites. The algorithm was validated on several artificial datasets and then applied to reconstruct the evolutionary history of the NrdR, MntR, LacI, FNR, Irr, Fur and Rrf2 transcription factor families. The algorithm seems to be sufficiently robust to be applicable in realistic situations. In most transcription factor families the changes in binding motifs are limited to several branches. Changes in consensus nucleotides proceed via an intermediate stage when the respective position is not conserved.
Collapse
|