1
|
Predicting Selective RNA Processing and Stabilization Operons in Clostridium spp. Front Microbiol 2021; 12:673349. [PMID: 34177856 PMCID: PMC8219983 DOI: 10.3389/fmicb.2021.673349] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2021] [Accepted: 04/28/2021] [Indexed: 11/29/2022] Open
Abstract
In selective RNA processing and stabilization (SRPS) operons, stem–loops (SLs) located at the 3′-UTR region of selected genes can control the stability of the corresponding transcripts and determine the stoichiometry of the operon. Here, for such operons, we developed a computational approach named SLOFE (stem–loop free energy) that identifies the SRPS operons and predicts their transcript- and protein-level stoichiometry at the whole-genome scale using only the genome sequence via the minimum free energy (ΔG) of specific SLs in the intergenic regions within operons. As validated by the experimental approach of differential RNA-Seq, SLOFE identifies genome-wide SRPS operons in Clostridium cellulolyticum with 80% accuracy and reveals that the SRPS mechanism contributes to diverse cellular activities. Moreover, in the identified SRPS operons, SLOFE predicts the transcript- and protein-level stoichiometry, including those encoding cellulosome complexes, ATP synthases, ABC transporter family proteins, and ribosomal proteins. Its accuracy exceeds those of existing in silico approaches in C. cellulolyticum, Clostridium acetobutylicum, Clostridium thermocellum, and Bacillus subtilis. The ability to identify genome-wide SRPS operons and predict their stoichiometry via DNA sequence in silico should facilitate studying the function and evolution of SRPS operons in bacteria.
Collapse
|
2
|
The first determination and analysis of the complete mitochondrial genome of Ancistrus temmincki (Siluriformes: Loricariidae). MITOCHONDRIAL DNA PART B-RESOURCES 2021; 6:1583-1585. [PMID: 34027063 PMCID: PMC8110184 DOI: 10.1080/23802359.2020.1866446] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
In order to fully comprehend the evolution and kinship of fishes in the family of Loricariidae, the complete mitochondrial genome of the Loricariidae fish Ancistrus temmincki was firstly characterized in the present study. The whole mitogenome was 16,657 bp in size and consisted of 13 protein-coding genes, 22 tRNAs, 2 rRNAs genes, a control region and origin of light-strand replication. The proportion of coding sequences with a total length of 11,473 bp was 68.88%, which encoded 3,813 amino acids. The genome composition was highly A + T biased (56.29%), and exhibited AT-skew (0.0661) and a negative GC-skew (–0.2740). All protein-coding genes were started with ATG except for GTG in CO1, while stopped with the standard TAN codons or a single T. The control region (D-loop) ranging from 15,635 bp to 16,657 bp was 1023 bp in size. Until now, there is hardly any studies on the complete mitochondrial sequence in the genus of Ancistrus, phylogenetic analysis showed that A. temmincki was most closely related to Ancistrus cryptophthalmus in the genus of Ancistrus. The complete mitochondrial genome sequence has provided a new insight into the taxonomic classification, and a more complex picture of the species diversity within the family of Loricariidae.
Collapse
|
3
|
The complete mitochondrial genome of Nematobrycon palmeri (Characiformes:Nematobrycon) and phylogenetic studies of Characidaes. MITOCHONDRIAL DNA PART B-RESOURCES 2020; 5:3474-3475. [PMID: 33458208 PMCID: PMC7782840 DOI: 10.1080/23802359.2020.1825130] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
Abstract
Complete mitochondrial genome of the Characiform fish Nematobrycon palmeri was characterized in the present study. The whole mitogenome was 17,340 bp in size and the proportion of coding sequences with a total length of 11,448 bp was 66.02%, which encodes 3805 amino acids. The base composition of the genome was 30.92% for A, 23.92% for C, 14.88% for G, and 30.28% for T. All protein-coding genes were started with ATG, CO1 and ATP8 ended by AGG, TAG respectively, whereas CO2, ATP6, ND4 ended by a single T, the other PCGs commonly ended by TAA. The length of 12S and 16S ribosomal RNA was 949 bp and 1675 bp, respectively. The control region (D-loop) ranging from 15,654 bp to 17,340 bp was 1687 bp in size. It showed negative GC skew value (-0.2329) and positive AT skewness (0.0105). Phylogenetic analysis showed that N. palmeri was most closely related to Gephyrocharax atracaudatus. The complete mitochondrial genome sequence would provide a new insight into taxonomic classification, and help to draw a more complete picture of species diversity within the Characidae.
Collapse
|
4
|
Phylogenetic analysis of the complete mitochondrial genome of Jaydia carinatus (Kurtiformes; Apogonidae). Mitochondrial DNA B Resour 2020. [DOI: 10.1080/23802359.2020.1721360] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022] Open
|
5
|
The complete mitochondrial genome of Pseudomugil furcatus (Atheriniformes: Pseudomugil) and phylogenetic studies of Atheriniformes. Mitochondrial DNA B Resour 2020. [DOI: 10.1080/23802359.2020.1749175] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022] Open
|
6
|
The complete mitochondrial genome of Hemigrammus bleheri (Characiformes: Hemigrammus) and phylogenetic studies of Characiformes. Mitochondrial DNA B Resour 2019; 4:3834-3835. [PMID: 33366209 PMCID: PMC7707484 DOI: 10.1080/23802359.2019.1681309] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2019] [Accepted: 10/13/2019] [Indexed: 10/26/2022] Open
Abstract
Complete mitochondrial genome of the characiform fish Hemigrammus bleheri was characterized in the present study. The whole mitogenome was 17,021 bp in size and consisted of 13 protein-coding genes (PCGs), 22 tRNAs, 2 rRNAs genes, a control region, and origin of light-strand replication. The proportion of coding sequences with a total length of 11,415 bp is 67.06%, which encodes 3805 amino acids. Similar to other Hemigrammus species, the base composition of H. bleheri was 29.30% for A, 25.26% for C, 16.36% for G, and 29.08% for T. All PCGs started with Met. ND1, ND3, ND4L, ND6, and CytB ended with TAA as the stop codon. ND2, ATP8, and ND5 ended with TAG as a stop codon, CO2, ATP6, CO3, and ND4 ended simply by T, and CO1 ended by a single AGG. The lengths of 12S ribosomal RNA and 16S ribosomal RNA were 924 bp and 1681 bp, respectively. The length of control region (D-loop) was 1308 bp, ranging from 15,714 to 17,021 bp. The complete mitochondrial genome sequence provided here would be helpful in further understanding the evolution of characiformes and conservation genetics of H. bleheri.
Collapse
|
7
|
The complete mitochondrial genome of Poecilia formosa ( Poecilia, Cyprinodontidae) and phylogenetic studies of cyprinodontiformes. Mitochondrial DNA B Resour 2019; 4:3820-3821. [PMID: 33366203 PMCID: PMC7707524 DOI: 10.1080/23802359.2019.1681308] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2019] [Accepted: 10/13/2019] [Indexed: 11/01/2022] Open
Abstract
We report the complete mitochondrial genome sequence of Poecilia formosa. The genome is found to be 16636 bp in length and has a base composition of A (29.59%), G (14.61%), C (28.26%), and T (27.54%). Similar to other Poecilia species, it contains a typically conserved structure including 13 protein-coding genes, 2 rRNA genes, 1 control region (D-loop), and 22tRNA genes. The proportion of coding sequences with a total length of 11,533 bp is 69.33%, which encodes 3837 amino acids. All protein-coding genes started with Met, ND1, CO1, ATP8, ATP6, CO3, ND4L, ND5, ND6 and CytB ended by TAA as a stop codon, ND2 and ND3 ended by TAG as a stop codon, CO3 and ND4 ended by a single T. The lengths of 12S ribosomal RNA is 948 bp, ranging from 70 bp to 1018 bp, and the lengths of 16S ribosomal RNA is 1674 bp, ranging from 1090 bp to 2764 bp. The length of control region is 879 bp, ranging from 15757 bp to 16636 bp, respectively. The complete mitochondrial genome sequence provided here would be useful for further understanding the evolution of ratite and conservation genetics of Poecilia formosa.
Collapse
|
8
|
The complete mitochondrial genome of Gephyrocharax atracaudatus (Characiformes, Characidae) and phylogenetic studies of Characiformes. Mitochondrial DNA B Resour 2019. [DOI: 10.1080/23802359.2018.1532830] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022] Open
|
9
|
The complete mitochondrial genome of Hepsetus odoe (Hepsetidae, characoidei) and phylogenetic studies of characoidei. Mitochondrial DNA B Resour 2019. [DOI: 10.1080/23802359.2018.1542994] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022] Open
|
10
|
The complete mitochondrial genome of Chaetodon octofasciatus (Perciformes: Chaetodontidae) and phylogenetic studies of Percoidea. Mitochondrial DNA B Resour 2018; 3:531-532. [PMID: 33474230 PMCID: PMC7799963 DOI: 10.1080/23802359.2018.1467218] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2018] [Accepted: 04/16/2018] [Indexed: 11/17/2022] Open
Abstract
The complete mitochondrial genome of this species was first determined in this study, which is 16,485 bp in length, containing 13 protein-coding genes, 2 rRNA genes, 22 tRNA genes, a putative control region, and 1 origin of replication on the light-strand. The overall base composition includes C(28.2%), A(28.3%), T(27.4%), and G(16.1%). Moreover, the 13 PCGs encode 3796 amino acids in total, 12 of which use the initiation codon ATG except COI that uses GTG. Most of them have TAA as the stop codon, whereas ND3 ends with TAG, and three protein-coding genes (COII, ND4, and Cytb) ended with the incomplete stop codon represented as a single T. The phylogenetic tree based on the Neighbor Joining method was constructed to provide relationship within Percoidea, which could be a useful basis for management of this species.
Collapse
|
11
|
Identification and characterization of a minisatellite contained within a novel miniature inverted-repeat transposable element (MITE) of Porphyromonas gingivalis. Mob DNA 2015; 6:18. [PMID: 26448788 PMCID: PMC4596501 DOI: 10.1186/s13100-015-0049-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2015] [Accepted: 09/23/2015] [Indexed: 12/26/2022] Open
Abstract
Background Repetitive regions of DNA and transposable elements have been found to constitute large percentages of eukaryotic and prokaryotic genomes. Such elements are known to be involved in transcriptional regulation, host-pathogen interactions and genome evolution. Results We identified a minisatellite contained within a miniature inverted-repeat transposable element (MITE) in Porphyromonas gingivalis. The P. gingivalis minisatellite and associated MITE, named ‘BrickBuilt’, comprises a tandemly repeating twenty-three nucleotide DNA sequence lacking spacer regions between repeats, and with flanking ‘leader’ and ‘tail’ subunits that include small inverted-repeat ends. Forms of the BrickBuilt MITE are found 19 times in the genome of P. gingivalis strain ATCC 33277, and also multiple times within the strains W83, TDC60, HG66 and JCVI SC001. BrickBuilt is always located intergenically ranging between 49 and 591 nucleotides from the nearest upstream and downstream coding sequences. Segments of BrickBuilt contain promoter elements with bidirectional transcription capabilities. Conclusions We performed a bioinformatic analysis of BrickBuilt utilizing existing whole genome sequencing, microarray and RNAseq data, as well as performing in vitro promoter probe assays to determine potential roles, mechanisms and regulation of the expression of these elements and their affect on surrounding loci. The multiplicity, localization and limited host range nature of MITEs and MITE-like elements in P. gingivalis suggest that these elements may play an important role in facilitating genome evolution as well as modulating the transcriptional regulatory system. Electronic supplementary material The online version of this article (doi:10.1186/s13100-015-0049-1) contains supplementary material, which is available to authorized users.
Collapse
|
12
|
The mitochondrial and chloroplast genomes of the haptophyte Chrysochromulina tobin contain unique repeat structures and gene profiles. BMC Genomics 2014; 15:604. [PMID: 25034814 PMCID: PMC4226036 DOI: 10.1186/1471-2164-15-604] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2014] [Accepted: 07/09/2014] [Indexed: 01/22/2023] Open
Abstract
BACKGROUND Haptophytes are widely and abundantly distributed in both marine and freshwater ecosystems. Few genomic analyses of representatives within this taxon have been reported, despite their early evolutionary origins and their prominent role in global carbon fixation. RESULTS The complete mitochondrial and chloroplast genome sequences of the haptophyte Chrysochromulina tobin (Prymnesiales) provide insight into the architecture and gene content of haptophyte organellar genomes. The mitochondrial genome (~34 kb) encodes 21 protein coding genes and contains a complex, 9 kb tandem repeat region. Similar to other haptophytes and rhodophytes, but not cryptophytes or stramenopiles, the mitochondrial genome has lost the nad7, nad9 and nad11 genes. The ~105 kb chloroplast genome encodes 112 protein coding genes, including ycf39 which has strong structural homology to NADP-binding nitrate transcriptional regulators; a divergent 'CheY-like' two-component response regulator (ycf55) and Tic/Toc (ycf60 and ycf80) membrane transporters. Notably, a zinc finger domain has been identified in the rpl36 ribosomal protein gene of all chloroplasts sequenced to date with the exception of haptophytes and cryptophytes--algae that have gained (via lateral gene transfer) an alternative rpl36 lacking the zinc finger motif. The two C. tobin chloroplast ribosomal RNA operon spacer regions differ in tRNA content. Additionally, each ribosomal operon contains multiple single nucleotide polymorphisms (SNPs)--a pattern observed in rhodophytes and cryptophytes, but few stramenopiles. Analysis of small (<200 bp) chloroplast encoded tandem and inverted repeats in C. tobin and 78 other algal chloroplast genomes show that repeat type, size and location are correlated with gene identity and taxonomic clade. CONCLUSION The Chrysochromulina tobin organellar genomes provide new insight into organellar function and evolution. These are the first organellar genomes to be determined for the prymnesiales, a taxon that is present in both oceanic and freshwater systems and represents major primary photosynthetic producers and contributors to global ecosystem stability.
Collapse
|
13
|
Prophage-mediated dynamics of 'Candidatus Liberibacter asiaticus' populations, the destructive bacterial pathogens of citrus huanglongbing. PLoS One 2013; 8:e82248. [PMID: 24349235 PMCID: PMC3862640 DOI: 10.1371/journal.pone.0082248] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2013] [Accepted: 10/22/2013] [Indexed: 01/21/2023] Open
Abstract
Prophages are highly dynamic components in the bacterial genome and play an important role in intraspecies variations. There are at least two prophages in the chromosomes of Candidatus Liberibacter asiaticus' (Las) Floridian isolates. Las is both unculturable and the most prevalent species of Liberibacter pathogens that cause huanglongbing (HLB), a worldwide destructive disease of citrus. In this study, seven new prophage variants resulting from two hyper-variable regions were identified by screening clone libraries of infected citrus, periwinkle and psyllids. Among them, Types A and B share highly conserved sequences and localize within the two prophages, FP1 and FP2, respectively. Although Types B and C were abundant in all three libraries, Type A was much more abundant in the libraries from the Las-infected psyllids than from the Las-infected plants, and Type D was only identified in libraries from the infected host plants but not from the infected psyllids. Sequence analysis of these variants revealed that the variations may result from recombination and rearrangement events. Conventional PCR results using type-specific molecular markers indicated that A, B, C and D are the four most abundant types in Las-infected citrus and periwinkle. However, only three types, A, B and C are abundant in Las-infected psyllids. Typing results for Las-infected citrus field samples indicated that mixed populations of Las bacteria present in Floridian isolates, but only the Type D population was correlated with the blotchy mottle symptom. Extended cloning and sequencing of the Type D region revealed a third prophage/phage in the Las genome, which may derive from the recombination of FP1 and FP2. Dramatic variations in these prophage regions were also found among the global Las isolates. These results are the first to demonstrate the prophage/phage-mediated dynamics of Las populations in plant and insect hosts, and their correlation with insect transmission and disease development.
Collapse
|
14
|
RNASurface: fast and accurate detection of locally optimal potentially structured RNA segments. ACTA ACUST UNITED AC 2013; 30:457-63. [PMID: 24292360 DOI: 10.1093/bioinformatics/btt701] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
MOTIVATION During the past decade, new classes of non-coding RNAs (ncRNAs) and their unexpected functions were discovered. Stable secondary structure is the key feature of many non-coding RNAs. Taking into account huge amounts of genomic data, development of computational methods to survey genomes for structured RNAs remains an actual problem, especially when homologous sequences are not available for comparative analysis. Existing programs scan genomes with a fixed window by efficiently constructing a matrix of RNA minimum free energies. A wide range of lengths of structured RNAs necessitates the use of many different window lengths that substantially increases the output size and computational efforts. RESULTS In this article, we present an algorithm RNASurface to efficiently scan genomes by constructing a matrix of significance of RNA secondary structures and to identify all locally optimal structured RNA segments up to a predefined size. RNASurface significantly improves precision of identification of known ncRNA in Bacillus subtilis. AVAILABILITY AND IMPLEMENTATION RNASurface C source code is available from http://bioinf.fbb.msu.ru/RNASurface/downloads.html.
Collapse
|
15
|
Abstract
Various regulatory elements in messenger RNAs (mRNAs) carrying the secondary structure play important roles in a wide range of expression processes. Numerous recent works have focused on the discovery of these functional elements that contain the conserved mRNA structures. However, to date, regions with high structural stability have been largely overlooked. In this study, we defined high stability regions (HSRs) in the coding sequences (CDSs) in bacteria based on the normalized folding free energy. We found that CDSs had high number of HSRs, and these HSRs showed high structural context robustness compared with random sequences, indicating a direct selective constraint imposed on HSRs. A reduced ribosome speed was detected near the start position of HSR, implying a possibility that HSR acted as obstacle to drive translational pausing that coordinated protein synthesis. Interestingly, we found that genes with high HSR density were enriched in the processes of translation, protein folding, and cell division. In addition, essential genes exhibited higher HSR density than nonessential genes. Overall, our study presented the previously unappreciated correlation between the number variation of HSRs and cellular processes.
Collapse
|
16
|
GTAG- and CGTC-tagged palindromic DNA repeats in prokaryotes. BMC Genomics 2013; 14:522. [PMID: 23902135 PMCID: PMC3733652 DOI: 10.1186/1471-2164-14-522] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2013] [Accepted: 07/30/2013] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND REPs (Repetitive Extragenic Palindromes) are small (20-40 bp) palindromic repeats found in high copies in some prokaryotic genomes, hypothesized to play a role in DNA supercoiling, transcription termination, mRNA stabilization. RESULTS We have monitored a large number of REP elements in prokaryotic genomes, and found that most can be sorted into two large DNA super-families, as they feature at one end unpaired motifs fitting either the GTAG or the CGTC consensus. Tagged REPs have been identified in >80 species in 8 different phyla. GTAG and CGTC repeats reside predominantly in microorganisms of the gamma and alpha division of Proteobacteria, respectively. However, the identification of members of both super- families in deeper branching phyla such Cyanobacteria and Planctomycetes supports the notion that REPs are old components of the bacterial chromosome. On the basis of sequence content and overall structure, GTAG and CGTC repeats have been assigned to 24 and 4 families, respectively. Of these, some are species-specific, others reside in multiple species, and several organisms contain different REP types. In many families, most units are close to each other in opposite orientation, and may potentially fold into larger secondary structures. In different REP-rich genomes the repeats are predominantly located between unidirectionally and convergently transcribed ORFs. REPs are predominantly located downstream from coding regions, and many are plausibly transcribed and function as RNA elements. REPs located inside genes have been identified in several species. Many lie within replication and global genome repair genes. It has been hypothesized that GTAG REPs are miniature transposons mobilized by specific transposases known as RAYTs (REP associated tyrosine transposases). RAYT genes are flanked either by GTAG repeats or by long terminal inverted repeats (TIRs) unrelated to GTAG repeats. Moderately abundant families of TIRs have been identified in multiple species. CONCLUSIONS CGTC REPs apparently lack a dedicated transposase. Future work will clarify whether these elements may be mobilized by RAYTs or other transposases, and assess if de-novo formation of either GTAG or CGTC repeats type still occurs.
Collapse
|
17
|
The protease-activated receptor 1 possesses a functional and cleavable signal peptide which is necessary for receptor expression. FEBS Lett 2012; 586:2351-9. [PMID: 22659187 DOI: 10.1016/j.febslet.2012.05.042] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2012] [Revised: 05/11/2012] [Accepted: 05/16/2012] [Indexed: 01/20/2023]
Abstract
The protease-activated receptor 1 (PAR1) is activated by thrombin cleavage releasing the physiologically-relevant parstatin peptide (residues 1-41). However, the actual length of parstatin was unclear since the receptor may also possess a cleavable signal peptide (residues 1-21) according to prediction programs. Here, we show that this putative signal peptide is indeed functional and removed from the PAR1 resolving the question of parstatin length. Moreover, we show that the sequence encoding the signal peptide may surprisingly play a role in stabilization of the PAR1 mRNA, a function which would be novel for a G protein-coupled receptor.
Collapse
|
18
|
Motif frequency and evolutionary search times in RNA populations. J Theor Biol 2011; 280:117-26. [PMID: 21419782 DOI: 10.1016/j.jtbi.2011.03.010] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2010] [Revised: 01/26/2011] [Accepted: 03/10/2011] [Indexed: 02/07/2023]
Abstract
RNA molecules, through their dual identity as sequence and structure, are an appropriate experimental and theoretical model to study the genotype-phenotype map and evolutionary processes taking place in simple replicator populations. In this computational study, we relate properties of the sequence-structure map, in particular the abundance of a given secondary structure in a random pool, with the number of replicative events that an initially random population of sequences needs to find that structure through mutation and selection. For common structures, this search process turns out to be much faster than for rare structures. Furthermore, search and fixation processes are more efficient in a wider range of mutation rates for common structures, thus indicating that evolvability of RNA populations is not simply determined by abundance. We also find significant differences in the search and fixation processes for structures of same abundance, and relate them with the number of base pairs forming the structure. Moreover, the influence of the nucleotide content of the RNA sequences on the search process is studied. Our results advance in the understanding of the distribution and attainability of RNA secondary structures. They hint at the fact that, beyond sequence length and sequence-to-function redundancy, the mutation rate that permits localization and fixation of a given phenotype strongly depends on its relative abundance and global, in general non-uniform, distribution in sequence space.
Collapse
|
19
|
A versatile palindromic amphipathic repeat coding sequence horizontally distributed among diverse bacterial and eucaryotic microbes. BMC Genomics 2010; 11:430. [PMID: 20626840 PMCID: PMC2996958 DOI: 10.1186/1471-2164-11-430] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2010] [Accepted: 07/13/2010] [Indexed: 01/07/2023] Open
Abstract
Background Intragenic tandem repeats occur throughout all domains of life and impart functional and structural variability to diverse translation products. Repeat proteins confer distinctive surface phenotypes to many unicellular organisms, including those with minimal genomes such as the wall-less bacterial monoderms, Mollicutes. One such repeat pattern in this clade is distributed in a manner suggesting its exchange by horizontal gene transfer (HGT). Expanding genome sequence databases reveal the pattern in a widening range of bacteria, and recently among eucaryotic microbes. We examined the genomic flux and consequences of the motif by determining its distribution, predicted structural features and association with membrane-targeted proteins. Results Using a refined hidden Markov model, we document a 25-residue protein sequence motif tandemly arrayed in variable-number repeats in ORFs lacking assigned functions. It appears sporadically in unicellular microbes from disparate bacterial and eucaryotic clades, representing diverse lifestyles and ecological niches that include host parasitic, marine and extreme environments. Tracts of the repeats predict a malleable configuration of recurring domains, with conserved hydrophobic residues forming an amphipathic secondary structure in which hydrophilic residues endow extensive sequence variation. Many ORFs with these domains also have membrane-targeting sequences that predict assorted topologies; others may comprise reservoirs of sequence variants. We demonstrate expressed variants among surface lipoproteins that distinguish closely related animal pathogens belonging to a subgroup of the Mollicutes. DNA sequences encoding the tandem domains display dyad symmetry. Moreover, in some taxa the domains occur in ORFs selectively associated with mobile elements. These features, a punctate phylogenetic distribution, and different patterns of dispersal in genomes of related taxa, suggest that the repeat may be disseminated by HGT and intra-genomic shuffling. Conclusions We describe novel features of PARCELs (Palindromic Amphipathic Repeat Coding ELements), a set of widely distributed repeat protein domains and coding sequences that were likely acquired through HGT by diverse unicellular microbes, further mobilized and diversified within genomes, and co-opted for expression in the membrane proteome of some taxa. Disseminated by multiple gene-centric vehicles, ORFs harboring these elements enhance accessory gene pools as part of the "mobilome" connecting genomes of various clades, in taxa sharing common niches.
Collapse
|
20
|
Abstract
The genome of Stenotrophomonas maltophilia is peppered with palindromic elements called SMAG (Stenotrophomonas maltophilia GTAG) because they carry at one terminus the tetranucleotide GTAG. The repeats are species-specific variants of the superfamily of repetitive extragenic palindromes (REPs), DNA sequences spread in the intergenic space in many prokaryotic genomes. The genomic organization and the functional features of SMAG elements are described herein. A total of 1650 SMAG elements were identified in the genome of the S. maltophilia K279a strain. The elements are 22-25 bp in size, and can be sorted into five distinct major subfamilies because they have different stem and loop sequences. One fifth of the SMAG family is comprised of single units, 2/5 of elements located at a close distance from each other and 2/5 of elements grouped in tandem arrays of variable lengths. Altogether, SMAGs and intermingled DNA occupy 13% of the intergenic space, and make up 1.4% of the chromosome. Hundreds of genes are immediately flanked by SMAGs, and the level of expression of many may be influenced by the folding of the repeats in the mRNA. Expression analyses suggested that SMAGs function as RNA control sequences, either stabilizing upstream transcripts or favoring their degradation.
Collapse
|
21
|
Limited contribution of stem-loop potential to symmetry of single-stranded genomic DNA. ACTA ACUST UNITED AC 2009; 26:478-85. [PMID: 20031973 DOI: 10.1093/bioinformatics/btp703] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
MOTIVATION The phenomenon of strand symmetry, which may provide clues to genome evolution, exists in all prokaryotic and eukaryotic genomes studied. Several possible mechanisms for its origins have been proposed, including: no strand biases for mutation and selection, strand inversion and selection of stem-loop structures. However, the relative contributions of these mechanisms to strand symmetry are not clear. In this article, we studied specifically the role of stem-loop potential of single-stranded DNA in strand symmetry. RESULTS We analyzed the complete genomes of 90 prokaryotes. We found that most oligonucleotides (pentanucleotides and higher) do not have a reverse complement in close proximity in the genomic sequences. Combined with further analysis, we conclude that the contribution of the widespread stem-loop potential of single-stranded genomic DNA to the formation and maintenance of strand symmetry would be very limited, at least for higher-order oligonucleotides. Therefore, other possible causes for strand symmetry must be taken into account to a deeper degree.
Collapse
|
22
|
Inverse symmetry in complete genomes and whole-genome inverse duplication. PLoS One 2009; 4:e7553. [PMID: 19898631 PMCID: PMC2771390 DOI: 10.1371/journal.pone.0007553] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2009] [Accepted: 07/22/2009] [Indexed: 12/18/2022] Open
Abstract
The cause of symmetry is usually subtle, and its study often leads to a deeper understanding of the bearer of the symmetry. To gain insight into the dynamics driving the growth and evolution of genomes, we conducted a comprehensive study of textual symmetries in 786 complete chromosomes. We focused on symmetry based on our belief that, in spite of their extreme diversity, genomes must share common dynamical principles and mechanisms that drive their growth and evolution, and that the most robust footprints of such dynamics are symmetry related. We found that while complement and reverse symmetries are essentially absent in genomic sequences, inverse-complement plus reverse-symmetry is prevalent in complex patterns in most chromosomes, a vast majority of which have near maximum global inverse symmetry. We also discovered relations that can quantitatively account for the long observed but unexplained phenomenon of -mer skews in genomes. Our results suggest segmental and whole-genome inverse duplications are important mechanisms in genome growth and evolution, probably because they are efficient means by which the genome can exploit its double-stranded structure to enrich its code-inventory.
Collapse
|
23
|
Using Genomic Data to Determine the Diversity and Distribution of Target Site Motifs Recognized by Class C-attC Group II Introns. J Mol Evol 2009; 68:539-49. [DOI: 10.1007/s00239-009-9228-3] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2008] [Revised: 03/27/2009] [Accepted: 03/31/2009] [Indexed: 01/31/2023]
|
24
|
Abstract
Enterococcus faecalis/faecium repeats (EFARs) are miniature insertion sequences spread in the genome of Enterococcus faecalis and Enterococcus faecium. Unit-length repeats measure 165-170 bp and contain two modules (B and T) capable of folding independently into stem-loop sequences, connected by a short, unstructured module J. The E. faecalis elements feature only one type of B, J and T modules. In contrast, the E. faecium elements result from the assembly of different types of B, J and T modules, and may vary in length because they carry multiple B modules. Most EFARs are located close (0-20 bp) to ORF stop codons, and are thus cotranscribed with upstream flanking genes. In both E. faecalis and E. faecium cells, EFAR transcripts accumulate in a strand-dependent fashion. Data suggest that T modules function as bidirectional transcriptional terminators, which provide a 3'-end to gene transcripts spanning B modules, while blocking antisense transcripts coming in from the opposite direction.
Collapse
|
25
|
RNAVLab: A virtual laboratory for studying RNA secondary structures based on grid computing technology. PARALLEL COMPUTING 2008; 34:661-680. [PMID: 19885376 PMCID: PMC2714649 DOI: 10.1016/j.parco.2008.08.002] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/05/2007] [Revised: 06/06/2008] [Accepted: 08/21/2008] [Indexed: 05/28/2023]
Abstract
As ribonucleic acid (RNA) molecules play important roles in many biological processes including gene expression and regulation, their secondary structures have been the focus of many recent studies. Despite the computing power of supercomputers, computationally predicting secondary structures with thermodynamic methods is still not feasible when the RNA molecules have long nucleotide sequences and include complex motifs such as pseudoknots. This paper presents RNAVLab (RNA Virtual Laboratory), a virtual laboratory for studying RNA secondary structures including pseudoknots that allows scientists to address this challenge. Two important case studies show the versatility and functionalities of RNAVLab. The first study quantifies its capability to rebuild longer secondary structures from motifs found in systematically sampled nucleotide segments. The extensive sampling and predictions are made feasible in a short turnaround time because of the grid technology used. The second study shows how RNAVLab allows scientists to study the viral RNA genome replication mechanisms used by members of the virus family Nodaviridae.
Collapse
|
26
|
Abstract
Mobile DNA elements play a major role in genome plasticity and other evolutionary processes, an insight gained primarily through the study of transposons and retrotransposons (generally approximately 1000 nt or longer). These elements spawn smaller parasitic versions (generally >100 nt) that propagate through proteins encoded by the full elements. Highly repeated sequences smaller than 100 nt have been described, but they are either nonmobile or their origins are not known. We have surveyed the genome of the multicellular cyanobacterium, Nostoc punctiforme, and its relatives for small dispersed repeat (SDR) sequences and have identified eight families in the range of from 21 to 27 nucleotides. Three of the families (SDR4, SDR5, and SDR6), despite little sequence similarity, share a common predicted secondary structure, a conclusion supported by patterns of compensatory mutations. The SDR elements are found in a diverse set of contexts, often embedded within tandemly repeated heptameric sequences or within minitransposons. One element (SDR5) is found exclusively within instances of an octamer, HIP1, that is highly over-represented in the genomes of many cyanobacteria. Two elements (SDR1 and SDR4) often are found within copies of themselves, producing complex nested insertions. An analysis of SDR elements within cyanobacterial genomes indicate that they are essentially confined to a coherent subgroup. The evidence indicates that some of the SDR elements, probably working through RNA intermediates, have been mobile in recent evolutionary time, making them perhaps the smallest known mobile elements.
Collapse
|
27
|
On the structural repertoire of pools of short, random RNA sequences. J Theor Biol 2008; 252:750-63. [PMID: 18374951 DOI: 10.1016/j.jtbi.2008.02.018] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2007] [Revised: 01/14/2008] [Accepted: 02/13/2008] [Indexed: 01/21/2023]
Abstract
A detailed knowledge of the mapping between sequence and structure spaces in populations of RNA molecules is essential to better understand their present-day functional properties, to envisage a plausible early evolution of RNA in a prebiotic chemical environment and to improve the design of in vitro evolution experiments, among others. Analysis of natural RNAs, as well as in vitro and computational studies, show that certain RNA structural motifs are much more abundant than others, pointing out a complex relation between sequence and structure. Within this framework, we have investigated computationally the structural properties of a large pool (10(8) molecules) of single-stranded, 35 nt-long, random RNA sequences. The secondary structures obtained are ranked and classified into structure families. The number of structures in main families is analytically calculated and compared with the numerical results. This permits a quantification of the fraction of structure space covered by a large pool of sequences. We further show that the number of structural motifs and their frequency is highly unbalanced with respect to the nucleotide composition: simple structures such as stem-loops and hairpins arise from sequences depleted in G, while more complex structures require an enrichment of G. In general, we observe a strong correlation between subfamilies-characterized by a fixed number of paired nucleotides-and nucleotide composition. Our results are compared to the structural repertoire obtained in a second pool where isolated base pairs are prohibited.
Collapse
|
28
|
Systematic identification of stem-loop containing sequence families in bacterial genomes. BMC Genomics 2008; 9:20. [PMID: 18201379 PMCID: PMC2267715 DOI: 10.1186/1471-2164-9-20] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2007] [Accepted: 01/17/2008] [Indexed: 01/26/2023] Open
Abstract
BACKGROUND Analysis of non-coding sequences in several bacterial genomes brought to the identification of families of repeated sequences, able to fold as secondary structures. These sequences have often been claimed to be transcribed and fulfill a functional role. A previous systematic analysis of a representative set of 40 bacterial genomes produced a large collection of sequences, potentially able to fold as stem-loop structures (SLS). Computational analysis of these sequences was carried out by searching for families of repetitive nucleic acid elements sharing a common secondary structure. RESULTS The initial clustering procedure identified clusters of similar sequences in 29 genomes, corresponding to about 1% of the whole population. Sequences selected in this way have a substantially higher aptitude to fold into a stable secondary structure than the initial set. Removal of redundancies and regrouping of the selected sequences resulted in a final set of 92 families, defined by HMM analysis. 25 of them include all well-known SLS containing repeats and others reported in literature, but not analyzed in detail. The remaining 67 families have not been previously described. Two thirds of the families share a common predicted secondary structure and are located within intergenic regions. CONCLUSION Systematic analysis of 40 bacterial genomes revealed a large number of repeated sequence families, including known and novel ones. Their predicted structure and genomic location suggest that, even in compact bacterial genomes, a relatively large fraction of the genome consists of non-protein-coding sequences, possibly functioning at the RNA level.
Collapse
|
29
|
Abstract
The structural organization of Enterococcus faecalis repeats (EFAR) is described, palindromic DNA sequences identified in the genome of the Enterococcus faecalis V583 strain by in silico analyses. EFAR are a novel type of miniature insertion sequences, which vary in size from 42 to 650 bp. Length heterogeneity results from the variable assembly of 16 different sequence types. Most elements measure 170 bp, and can fold into peculiar L-shaped structures resulting from the folding of two independent stem-loop structures (SLSs). Homologous chromosomal regions lacking or containing EFAR sequences were identified by PCR among 20 E. faecalis clinical isolates of different genotypes. Sequencing of a representative set of 'empty' sites revealed that 24-37 bp-long sequences, unrelated to each other but all able to fold into SLSs, functioned as targets for the integration of EFAR. In the process, most of the SLS had been deleted, but part of the targeted stems had been retained at EFAR termini.
Collapse
|