1
|
Dimonaco NJ, Clare A, Kenobi K, Aubrey W, Creevey CJ. StORF-Reporter: finding genes between genes. Nucleic Acids Res 2023; 51:11504-11517. [PMID: 37897345 PMCID: PMC10682499 DOI: 10.1093/nar/gkad814] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2022] [Revised: 09/04/2023] [Accepted: 09/27/2023] [Indexed: 10/30/2023] Open
Abstract
Large regions of prokaryotic genomes are currently without any annotation, in part due to well-established limitations of annotation tools. For example, it is routine for genes using alternative start codons to be misreported or completely omitted. Therefore, we present StORF-Reporter, a tool that takes an annotated genome and returns regions that may contain missing CDS genes from unannotated regions. StORF-Reporter consists of two parts. The first begins with the extraction of unannotated regions from an annotated genome. Next, Stop-ORFs (StORFs) are identified in these unannotated regions. StORFs are open reading frames that are delimited by stop codons and thus can capture those genes most often missing in genome annotations. We show this methodology recovers genes missing from canonical genome annotations. We inspect the results of the genomes of model organisms, the pangenome of Escherichia coli, and a set of 5109 prokaryotic genomes of 247 genera from the Ensembl Bacteria database. StORF-Reporter extended the core, soft-core and accessory gene collections, identified novel gene families and extended families into additional genera. The high levels of sequence conservation observed between genera suggest that many of these StORFs are likely to be functional genes that should now be considered for inclusion in canonical annotations.
Collapse
Affiliation(s)
- Nicholas J Dimonaco
- Institute of Biological, Environmental and Rural Sciences, Aberystwyth University, Aberystwyth SY23 3PD, Wales, UK
- Department of Computer Science, Aberystwyth University, Aberystwyth SY23 3DB, Wales, UK
- Department of Medicine, McMaster University, Hamilton, ON, Canada
- Farncombe Family Digestive Health Research Institute, McMaster University, Hamilton, ON, Canada
- School of Biological Sciences, Queen’s University Belfast, Belfast BT7 1NN, Northern Ireland, UK
| | - Amanda Clare
- Department of Computer Science, Aberystwyth University, Aberystwyth SY23 3DB, Wales, UK
| | - Kim Kenobi
- Department of Mathematics, Aberystwyth University, Aberystwyth SY23 3BZ, Wales, UK
| | - Wayne Aubrey
- Department of Computer Science, Aberystwyth University, Aberystwyth SY23 3DB, Wales, UK
| | - Christopher J Creevey
- School of Biological Sciences, Queen’s University Belfast, Belfast BT7 1NN, Northern Ireland, UK
| |
Collapse
|
2
|
Dabbaghie F, Srikakulam SK, Marschall T, Kalinina OV. PanPA: generation and alignment of panproteome graphs. BIOINFORMATICS ADVANCES 2023; 3:vbad167. [PMID: 38145107 PMCID: PMC10748787 DOI: 10.1093/bioadv/vbad167] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Revised: 11/13/2023] [Accepted: 11/23/2023] [Indexed: 12/26/2023]
Abstract
Motivation Compared to eukaryotes, prokaryote genomes are more diverse through different mechanisms, including a higher mutation rate and horizontal gene transfer. Therefore, using a linear representative reference can cause a reference bias. Graph-based pangenome methods have been developed to tackle this problem. However, comparisons in DNA space are still challenging due to this high diversity. In contrast, amino acid sequences have higher similarity due to evolutionary constraints, whereby a single amino acid may be encoded by several synonymous codons. Coding regions cover the majority of the genome in prokaryotes. Thus, panproteomes present an attractive alternative leveraging the higher sequence similarity while not losing much of the genome in non-coding regions. Results We present PanPA, a method that takes a set of multiple sequence alignments of protein sequences, indexes them, and builds a graph for each multiple sequence alignment. In the querying step, it can align DNA or amino acid sequences back to these graphs. We first showcase that PanPA generates correct alignments on a panproteome from 1350 Escherichia coli. To demonstrate that panproteomes allow comparisons at longer phylogenetic distances, we compare DNA and protein alignments from 1073 Salmonella enterica assemblies against E.coli reference genome, pangenome, and panproteome using BWA, GraphAligner, and PanPA, respectively; with PanPA aligning around 22% more sequences. We also aligned a DNA short-reads whole genome sequencing (WGS) sample from S.enterica against the E.coli reference with BWA and the panproteome with PanPA, where PanPA was able to find alignment for 68% of the reads compared to 5% with BWA. Availalability and implementation PanPA is available at https://github.com/fawaz-dabbaghieh/PanPA.
Collapse
Affiliation(s)
- Fawaz Dabbaghie
- Institute for Medical Biometry and Bioinformatics, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, 40225 Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, 40225 Düsseldorf, Germany
- Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Center for Infection Research (HZI), Saarbrücken, Germany
| | - Sanjay K Srikakulam
- Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Center for Infection Research (HZI), Saarbrücken, Germany
- Graduate School of Computer Science, Saarland University, 66123 Saarbrücken, Germany
- Interdisciplinary Graduate School of Natural Product Research, Saarland University, 66123 Saarbrücken, Germany
| | - Tobias Marschall
- Institute for Medical Biometry and Bioinformatics, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, 40225 Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, 40225 Düsseldorf, Germany
| | - Olga V Kalinina
- Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Center for Infection Research (HZI), Saarbrücken, Germany
- Drug Bioinformatics, Medical Faculty, Saarland University, 66421 Homburg, Germany
- Center for Bioinformatics, Saarland University, 66123 Saarbrücken, Germany
| |
Collapse
|
3
|
Puente-Sánchez F, Hoetzinger M, Buck M, Bertilsson S. Exploring environmental intra-species diversity through non-redundant pangenome assemblies. Mol Ecol Resour 2023; 23:1724-1736. [PMID: 37382302 DOI: 10.1111/1755-0998.13826] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Revised: 05/24/2023] [Accepted: 06/15/2023] [Indexed: 06/30/2023]
Abstract
At the genome level, microorganisms are highly adaptable both in terms of allele and gene composition. Such heritable traits emerge in response to different environmental niches and can have a profound influence on microbial community dynamics. As a consequence, any individual genome or population will contain merely a fraction of the total genetic diversity of any operationally defined "species", whose ecological potential can thus be only fully understood by studying all of their genomes and the genes therein. This concept, known as the pangenome, is valuable for studying microbial ecology and evolution, as it partitions genomes into core (present in all the genomes from a species, and responsible for housekeeping and species-level niche adaptation among others) and accessory regions (present only in some, and responsible for intra-species differentiation). Here we present SuperPang, an algorithm producing pangenome assemblies from a set of input genomes of varying quality, including metagenome-assembled genomes (MAGs). SuperPang runs in linear time and its results are complete, non-redundant, preserve gene ordering and contain both coding and non-coding regions. Our approach provides a modular view of the pangenome, identifying operons and genomic islands, and allowing to track their prevalence in different populations. We illustrate this by analysing intra-species diversity in Polynucleobacter, a bacterial genus ubiquitous in freshwater ecosystems, characterized by their streamlined genomes and their ecological versatility. We show how SuperPang facilitates the simultaneous analysis of allelic and gene content variation under different environmental pressures, allowing us to study the drivers of microbial diversification at unprecedented resolution.
Collapse
Affiliation(s)
- Fernando Puente-Sánchez
- Department of Aquatic Sciences and Assessment, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Matthias Hoetzinger
- Department of Aquatic Sciences and Assessment, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Moritz Buck
- Department of Aquatic Sciences and Assessment, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Stefan Bertilsson
- Department of Aquatic Sciences and Assessment, Swedish University of Agricultural Sciences, Uppsala, Sweden
| |
Collapse
|
4
|
Cardenas-Alvarez MX, Restrepo-Montoya D, Bergholz TM. Genome-Wide Association Study of Listeria monocytogenes Isolates Causing Three Different Clinical Outcomes. Microorganisms 2022; 10:1934. [PMID: 36296210 PMCID: PMC9610272 DOI: 10.3390/microorganisms10101934] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Revised: 09/16/2022] [Accepted: 09/24/2022] [Indexed: 12/05/2022] Open
Abstract
Heterogeneity in virulence potential of L. monocytogenes subgroups have been associated with genetic elements that could provide advantages in certain environments to invade, multiply, and survive within a host. The presence of gene mutations has been found to be related to attenuated phenotypes, while the presence of groups of genes, such as pathogenicity islands (PI), has been associated with hypervirulent or stress-resistant clones. We evaluated 232 whole genome sequences from invasive listeriosis cases in human and ruminants from the US and Europe to identify genomic elements associated with strains causing three clinical outcomes: central nervous system (CNS) infections, maternal-neonatal (MN) infections, and systemic infections (SI). Phylogenetic relationships and virulence-associated genes were evaluated, and a gene-based and single nucleotide polymorphism (SNP)-based genome-wide association study (GWAS) were conducted in order to identify loci associated with the different clinical outcomes. The orthologous results indicated that genes of phage phiX174, transfer RNAs, and type I restriction-modification (RM) system genes along with SNPs in loci involved in environmental adaptation such as rpoB and a phosphotransferase system (PTS) were associated with one or more clinical outcomes. Detection of phenotype-specific candidate loci represents an approach that could narrow the group of genetic elements to be evaluated in future studies.
Collapse
Affiliation(s)
| | | | - Teresa M. Bergholz
- Department of Food Science and Human Nutrition, Michigan State University, East Lansing, MI 48824, USA
| |
Collapse
|
5
|
Exploring Bacterial Attributes That Underpin Symbiont Life in the Monogastric Gut. Appl Environ Microbiol 2022; 88:e0112822. [PMID: 36036591 PMCID: PMC9499014 DOI: 10.1128/aem.01128-22] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
The large bowel of monogastric animals, such as that of humans, is home to a microbial community (microbiota) composed of a diversity of mostly bacterial species. Interrelationships between the microbiota as an entity and the host are complex and lifelong and are characteristic of a symbiosis. The relationships may be disrupted in association with disease, resulting in dysbiosis. Modifications to the microbiota to correct dysbiosis require knowledge of the fundamental mechanisms by which symbionts inhabit the gut. This review aims to summarize aspects of niche fitness of bacterial species that inhabit the monogastric gut, especially of humans, and to indicate the research path by which progress can be made in exploring bacterial attributes that underpin symbiont life in the gut.
Collapse
|
6
|
Han Y, Li C, Yan Y, Lin M, Ke X, Zhang Y, Zhan Y. Post-transcriptional control of bacterial nitrogen metabolism by regulatory noncoding RNAs. World J Microbiol Biotechnol 2022; 38:126. [PMID: 35666348 PMCID: PMC9170634 DOI: 10.1007/s11274-022-03287-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2022] [Accepted: 04/12/2022] [Indexed: 12/04/2022]
Abstract
Nitrogen metabolism is the most basic process of material and energy metabolism in living organisms, and processes involving the uptake and use of different nitrogen sources are usually tightly regulated at the transcriptional and post-transcriptional levels. Bacterial regulatory noncoding RNAs are novel post-transcriptional regulators that repress or activate the expression of target genes through complementarily pairing with target mRNAs; therefore, these noncoding RNAs play an important regulatory role in many physiological processes, such as bacterial substance metabolism and stress response. In recent years, a study found that noncoding RNAs play a vital role in the post-transcriptional regulation of nitrogen metabolism, which is currently a hot topic in the study of bacterial nitrogen metabolism regulation. In this review, we present an overview of recent advances that increase our understanding on the regulatory roles of bacterial noncoding RNAs and describe in detail how noncoding RNAs regulate biological nitrogen fixation and nitrogen metabolic engineering. Furthermore, our goal is to lay a theoretical foundation for better understanding the molecular mechanisms in bacteria that are involved in environmental adaptations and metabolically-engineered genetic modifications.
Collapse
Affiliation(s)
- Yueyue Han
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Chao Li
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Yongliang Yan
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Min Lin
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Xiubin Ke
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Yunhua Zhang
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing, China. .,School of Resources and Environment, Anhui Agricultural University, Hefei, China.
| | - Yuhua Zhan
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing, China.
| |
Collapse
|
7
|
Nepal R, Houtak G, Shaghayegh G, Bouras G, Shearwin K, Psaltis AJ, Wormald PJ, Vreugde S. Prophages encoding human immune evasion cluster genes are enriched in Staphylococcus aureus isolated from chronic rhinosinusitis patients with nasal polyps. Microb Genom 2021; 7:000726. [PMID: 34907894 PMCID: PMC8767322 DOI: 10.1099/mgen.0.000726] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2021] [Accepted: 10/21/2021] [Indexed: 12/11/2022] Open
Abstract
Prophages affect bacterial fitness on multiple levels. These include bacterial infectivity, toxin secretion, virulence regulation, surface modification, immune stimulation and evasion and microbiome competition. Lysogenic conversion arms bacteria with novel accessory functions thereby increasing bacterial fitness, host adaptation and persistence, and antibiotic resistance. These properties allow the bacteria to occupy a niche long term and can contribute to chronic infections and inflammation such as chronic rhinosinusitis (CRS). In this study, we aimed to identify and characterize prophages present in Staphylococcus aureus from patients suffering from CRS in relation to CRS disease phenotype and severity. Prophage regions were identified using PHASTER. Various in silico tools like ResFinder and VF Analyzer were used to detect virulence genes and antibiotic resistance genes respectively. Progressive MAUVE and maximum likelihood were used for multiple sequence alignment and phylogenetics of prophages respectively. Disease severity of CRS patients was measured using computed tomography Lund-Mackay scores. Fifty-eight S. aureus clinical isolates (CIs) were obtained from 28 CRS patients without nasal polyp (CRSsNP) and 30 CRS patients with nasal polyp (CRSwNP). All CIs carried at least one prophage (average=3.6) and prophages contributed up to 7.7 % of the bacterial genome. Phage integrase genes were found in 55/58 (~95 %) S. aureus strains and 97/211 (~46 %) prophages. Prophages belonging to Sa3int integrase group (phiNM3, JS01, phiN315) (39/97, 40%) and Sa2int (phi2958PVL) (14/97, 14%) were the most prevalent prophages and harboured multiple virulence genes such as sak, scn, chp, lukE/D, sea. Intact prophages were more frequently identified in CRSwNP than in CRSsNP (P=0.0021). Intact prophages belonging to the Sa3int group were more frequent in CRSwNP than in CRSsNP (P=0.0008) and intact phiNM3 were exclusively found in CRSwNP patients (P=0.007). Our results expand the knowledge of prophages in S. aureus isolated from CRS patients and their possible role in disease development. These findings provide a platform for future investigations into potential tripartite associations between bacteria-prophage-human immune system, S. aureus evolution and CRS disease pathophysiology.
Collapse
Affiliation(s)
- Roshan Nepal
- Adelaide Medical School, Faculty of Health and Medical Sciences, The University of Adelaide, Adelaide, Australia
- The Department of Surgery – Otolaryngology Head and Neck Surgery, University of Adelaide and the Basil Hetzel Institute for Translational Health Research, Central Adelaide Local Health Network, Adelaide, South Australia, Australia
| | - Ghais Houtak
- Adelaide Medical School, Faculty of Health and Medical Sciences, The University of Adelaide, Adelaide, Australia
- The Department of Surgery – Otolaryngology Head and Neck Surgery, University of Adelaide and the Basil Hetzel Institute for Translational Health Research, Central Adelaide Local Health Network, Adelaide, South Australia, Australia
| | - Gohar Shaghayegh
- Adelaide Medical School, Faculty of Health and Medical Sciences, The University of Adelaide, Adelaide, Australia
- The Department of Surgery – Otolaryngology Head and Neck Surgery, University of Adelaide and the Basil Hetzel Institute for Translational Health Research, Central Adelaide Local Health Network, Adelaide, South Australia, Australia
| | - George Bouras
- Adelaide Medical School, Faculty of Health and Medical Sciences, The University of Adelaide, Adelaide, Australia
- The Department of Surgery – Otolaryngology Head and Neck Surgery, University of Adelaide and the Basil Hetzel Institute for Translational Health Research, Central Adelaide Local Health Network, Adelaide, South Australia, Australia
| | - Keith Shearwin
- School of Biological Sciences, Faculty of Sciences, The University of Adelaide, Adelaide, Australia
| | - Alkis James Psaltis
- Adelaide Medical School, Faculty of Health and Medical Sciences, The University of Adelaide, Adelaide, Australia
- The Department of Surgery – Otolaryngology Head and Neck Surgery, University of Adelaide and the Basil Hetzel Institute for Translational Health Research, Central Adelaide Local Health Network, Adelaide, South Australia, Australia
| | - Peter-John Wormald
- Adelaide Medical School, Faculty of Health and Medical Sciences, The University of Adelaide, Adelaide, Australia
- The Department of Surgery – Otolaryngology Head and Neck Surgery, University of Adelaide and the Basil Hetzel Institute for Translational Health Research, Central Adelaide Local Health Network, Adelaide, South Australia, Australia
| | - Sarah Vreugde
- Adelaide Medical School, Faculty of Health and Medical Sciences, The University of Adelaide, Adelaide, Australia
- The Department of Surgery – Otolaryngology Head and Neck Surgery, University of Adelaide and the Basil Hetzel Institute for Translational Health Research, Central Adelaide Local Health Network, Adelaide, South Australia, Australia
| |
Collapse
|
8
|
Park HJ, Gokhale CS, Bertels F. How sequence populations persist inside bacterial genomes. Genetics 2021; 217:6151697. [PMID: 33724360 PMCID: PMC8049555 DOI: 10.1093/genetics/iyab027] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2020] [Accepted: 02/04/2021] [Indexed: 01/04/2023] Open
Abstract
Compared to their eukaryotic counterparts, bacterial genomes are small and contain extremely tightly packed genes. Repetitive sequences are rare but not completely absent. One of the most common repeat families is REPINs. REPINs can replicate in the host genome and form populations that persist for millions of years. Here, we model the interactions of these intragenomic sequence populations with the bacterial host. We first confirm well-established results, in the presence and absence of horizontal gene transfer (hgt) sequence populations either expand until they drive the host to extinction or the sequence population gets purged from the genome. We then show that a sequence population can be stably maintained, when each individual sequence provides a benefit that decreases with increasing sequence population size. Maintaining a sequence population of stable size also requires the replication of the sequence population to be costly to the host, otherwise the sequence population size will increase indefinitely. Surprisingly, in regimes with high hgt rates, the benefit conferred by the sequence population does not have to exceed the damage it causes to its host. Our analyses provide a plausible scenario for the persistence of sequence populations in bacterial genomes. We also hypothesize a limited biologically relevant parameter range for the provided benefit, which can be tested in future experiments.
Collapse
Affiliation(s)
- Hye Jin Park
- Department of Evolutionary Theory, Max Planck Institute for Evolutionary Biology, Plön, 24306, Germany.,Asia Pacific Center for Theoretical Physics, Pohang, 37673, Korea.,Department of Physics, POSTECH, Pohang, 37673, Korea
| | - Chaitanya S Gokhale
- Research Group for Theoretical Models of Eco-evolutionary Dynamics, Department of Evolutionary Theory, Max Planck Institute for Evolutionary Biology, Plön, 24306, Germany
| | - Frederic Bertels
- Research Group for Microbial Molecular Evolution, Department of Microbial Population Biology, Max Planck Institute for Evolutionary Biology, Plön, 24306, Germany
| |
Collapse
|
9
|
Selveshwari S, Lele K, Dey S. Genomic signatures of UV resistance evolution in
Escherichia coli
depend on the growth phase during exposure. J Evol Biol 2021; 34:953-967. [DOI: 10.1111/jeb.13764] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2020] [Revised: 12/27/2020] [Accepted: 01/13/2021] [Indexed: 11/28/2022]
Affiliation(s)
- S Selveshwari
- Population Biology Laboratory, Biology Division Indian Institute of Science Education and Research Pune Maharashtra India
| | - Kasturi Lele
- Population Biology Laboratory, Biology Division Indian Institute of Science Education and Research Pune Maharashtra India
| | - Sutirth Dey
- Population Biology Laboratory, Biology Division Indian Institute of Science Education and Research Pune Maharashtra India
| |
Collapse
|
10
|
Krogh TJ, Franke A, Møller-Jensen J, Kaleta C. Elucidating the Influence of Chromosomal Architecture on Transcriptional Regulation in Prokaryotes - Observing Strong Local Effects of Nucleoid Structure on Gene Regulation. Front Microbiol 2020; 11:2002. [PMID: 32983020 PMCID: PMC7491251 DOI: 10.3389/fmicb.2020.02002] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2020] [Accepted: 07/29/2020] [Indexed: 11/13/2022] Open
Abstract
Both intrinsic and extrinsic mechanisms regulating bacterial expression have been elucidated and described, however, such studies have mainly focused on local effects on the two-dimensional structure of the prokaryote genome while long-range as well as spatial interactions influencing gene expression are still only poorly understood. In this paper, we investigate the association between co-expression and distance between genes, using RNA-seq data at multiple growth phases in order to illuminate whether such conserved patterns are an indication of a gene regulatory mechanism relevant for prokaryotic cell proliferation, adaption, and evolution. We observe recurrent sinusoidal patterns in correlation of pairwise expression as function of genomic distance and rule out that these are caused by transcription-induced supercoiling gradients, gene clustering in operons, or association with regulatory transcription factors (TFs). By comparing spatial proximity for pairs of genomic bins with their correlation of pairwise expression, we further observe a high co-expression proportional with the spatial proximity. Based on these observations, we propose that the observed patterns are related to nucleoid structure as a product of transcriptional spilling, where genes actively influence transcription of spatially proximal genes through increases within shared local pools of RNA polymerases (RNAP), and actively spilling transcription onto neighboring genes.
Collapse
Affiliation(s)
- Thøger Jensen Krogh
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, Odense, Denmark
| | - Andre Franke
- Institute of Clinical Molecular Biology (IKMB), Christian-Albrechts-University Kiel, Kiel, Germany
| | - Jakob Møller-Jensen
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, Odense, Denmark
| | - Christoph Kaleta
- Institute of Experimental Medicine, Christian-Albrechts-University Kiel, Kiel, Germany
| |
Collapse
|
11
|
Optimal Growth Temperature and Intergenic Distances in Bacteria, Archaea, and Plastids of Rhodophytic Branch. BIOMED RESEARCH INTERNATIONAL 2020; 2020:3465380. [PMID: 32025518 PMCID: PMC6991167 DOI: 10.1155/2020/3465380] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/07/2019] [Revised: 10/19/2019] [Accepted: 12/23/2019] [Indexed: 01/07/2023]
Abstract
The lengths of intergenic regions between neighboring genes that are convergent, divergent, or unidirectional were calculated for plastids of the rhodophytic branch and complete archaeal and bacterial genomes. Statistically significant linear relationships between any pair of the medians of these three length types have been revealed in each genomic group. Exponential relationships between the optimal growth temperature and each of the three medians have been revealed as well. The leading coefficients of the regression equations relating all pairs of the medians as well as temperature and any of the medians have the same sign and order of magnitude. The results obtained for plastids, archaea, and bacteria are also similar at the qualitative level. For instance, the medians are always low at high temperatures. At low temperatures, the medians tend to statistically significant greater values and scattering. The original model was used to test our hypothesis that the intergenic distances are optimized in particular to decrease the competition of RNA polymerases within the locus that results in transcribing shortened RNAs. Overall, this points to an effect of temperature for both remote and close genomes.
Collapse
|
12
|
Carvalho Barbosa C, Calhoun SH, Wieden HJ. Non-coding RNAs: what are we missing? Biochem Cell Biol 2020; 98:23-30. [DOI: 10.1139/bcb-2019-0037] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Over the past two decades, the importance of small non-coding RNAs (sncRNAs) as regulatory molecules has become apparent in all three domains of life (archaea, bacteria, eukaryotes). In fact, sncRNAs play an important role in the control of gene expression at both the transcriptional and the post-transcriptional level, with crucial roles in fine-tuning cell responses during internal and external stress. Multiple pathways for sncRNA biogenesis and diverse mechanisms of regulation have been reported, and although biogenesis and mechanisms of sncRNAs in prokaryotes and eukaryotes are different, remarkable similarities exist. Here, we briefly review and compare the major sncRNA classes that act post-transcriptionally, and focus on recent discoveries regarding the ribosome as a target of regulation and the conservation of these mechanisms between prokaryotes and eukaryotes.
Collapse
Affiliation(s)
- Cristina Carvalho Barbosa
- Alberta RNA Research and Training Institute, Department of Chemistry and Biochemistry, University of Lethbridge, Lethbridge, AB T1K 3M4, Canada
- Alberta RNA Research and Training Institute, Department of Chemistry and Biochemistry, University of Lethbridge, Lethbridge, AB T1K 3M4, Canada
| | - Sydnee H. Calhoun
- Alberta RNA Research and Training Institute, Department of Chemistry and Biochemistry, University of Lethbridge, Lethbridge, AB T1K 3M4, Canada
- Alberta RNA Research and Training Institute, Department of Chemistry and Biochemistry, University of Lethbridge, Lethbridge, AB T1K 3M4, Canada
| | - Hans-Joachim Wieden
- Alberta RNA Research and Training Institute, Department of Chemistry and Biochemistry, University of Lethbridge, Lethbridge, AB T1K 3M4, Canada
| |
Collapse
|
13
|
Sun Q, Jiao F, Lin G, Yu J, Tang M. The nonlinear dynamics and fluctuations of mRNA levels in cell cycle coupled transcription. PLoS Comput Biol 2019; 15:e1007017. [PMID: 31034470 PMCID: PMC6508750 DOI: 10.1371/journal.pcbi.1007017] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2018] [Revised: 05/09/2019] [Accepted: 04/10/2019] [Indexed: 12/02/2022] Open
Abstract
Gene transcription is a noisy process, and cell division cycle is an important source of gene transcription noise. In this work, we develop a mathematical approach by coupling transcription kinetics with cell division cycles to delineate how they are combined to regulate transcription output and noise. In view of gene dosage, a cell cycle is divided into an early stage S1 and a late stage S2. The analytical forms for the mean and the noise of mRNA numbers are given in each stage. The analysis based on these formulas predicts precisely the fold change r* of mRNA numbers from S1 to S2 measured in a mouse embryonic stem cell line. When transcription follows similar kinetics in both stages, r* buffers against DNA dosage variation and r* ∈ (1, 2). Numerical simulations suggest that increasing cell cycle durations up-regulates transcription with less noise, whereas rapid stage transitions induce highly noisy transcription. A minimization of the transcription noise is observed when transcription homeostasis is attained by varying a single kinetic rate. When the transcription level scales with cellular volume, either by reducing the transcription burst frequency or by increasing the burst size in S2, the noise shows only a minor variation over a wide range of cell cycle stage durations. The reduction level in the burst frequency is nearly a constant, whereas the increase in the burst size is conceivably sensitive, when responding to a large random variation of the cell cycle durations and the gene duplication time. Gene transcription in single cells is inherently a stochastic process, resulting in a large variability in the number of transcripts and constituting the phenotypic heterogeneity in cell population. Cell division cycle has global effects on transcriptional outputs, and is thought to be an additional source of transcription noise. In this work, we develop a hybrid model to delineate the combined contribution of transcription activities and cell divisions in the variability of transcript counts. By working with the analytical forms of the mean and the noise of mRNA numbers, we show that if the transcription kinetic rates do not change considerably, then the average mRNA level is increased about 1 to 2 folds from earlier to later cell cycle stages. When transcription homeostasis is attained by varying a single kinetic rate between the two cell cycle stages, we find no significant changes in the transcription noise, and the homeostasis nearly minimizes the noise. In our continuous study on the transcript concentration homeostasis that the transcription level scales with the cellular volume, we find only minor variations of the noise if the homeostasis is maintained either by reducing the transcription burst frequency or by increasing the burst size in late cell cycle phase, in the face of a large cell cycle stage duration variation. The reduction in the burst frequency is relative robust, while the increase in the burst size is conceivably sensitive, to the large random variation of the cell cycle durations and the gene duplication time.
Collapse
Affiliation(s)
- Qiwen Sun
- Center for Applied Mathematics, Guangzhou University, Guangzhou, 510006, China
- Department of Mathematics, Michigan State University, East Lansing, Michigan, United States of America
| | - Feng Jiao
- Center for Applied Mathematics, Guangzhou University, Guangzhou, 510006, China
- Department of Mathematics, Michigan State University, East Lansing, Michigan, United States of America
| | - Genghong Lin
- Center for Applied Mathematics, Guangzhou University, Guangzhou, 510006, China
- Department of Mathematics, Michigan State University, East Lansing, Michigan, United States of America
| | - Jianshe Yu
- Center for Applied Mathematics, Guangzhou University, Guangzhou, 510006, China
| | - Moxun Tang
- Department of Mathematics, Michigan State University, East Lansing, Michigan, United States of America
- * E-mail:
| |
Collapse
|
14
|
Georgescu CH, Manson AL, Griggs AD, Desjardins CA, Pironti A, Wapinski I, Abeel T, Haas BJ, Earl AM. SynerClust: a highly scalable, synteny-aware orthologue clustering tool. Microb Genom 2018; 4. [PMID: 30418868 PMCID: PMC6321874 DOI: 10.1099/mgen.0.000231] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Accurate orthologue identification is a vital component of bacterial comparative genomic studies, but many popular sequence-similarity-based approaches do not scale well to the large numbers of genomes that are now generated routinely. Furthermore, most approaches do not take gene synteny into account, which is useful information for disentangling paralogues. Here, we present SynerClust, a user-friendly synteny-aware tool based on synergy that can process thousands of genomes. SynerClust was designed to analyse genomes with high levels of local synteny, particularly prokaryotes, which have operon structure. SynerClust’s run-time is optimized by selecting cluster representatives at each node in the phylogeny; thus, avoiding the need for exhaustive pairwise similarity searches. In benchmarking against Roary, Hieranoid2, PanX and Reciprocal Best Hit, SynerClust was able to more completely identify sets of core genes for datasets that included diverse strains, while using substantially less memory, and with scalability comparable to the fastest tools. Due to its scalability, ease of installation and use, and suitability for a variety of computing environments, orthogroup clustering using SynerClust will enable many large-scale prokaryotic comparative genomics efforts.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Thomas Abeel
- 1Broad Institute, Cambridge, MA, USA.,3Delft University of Technology, Delft, The Netherlands
| | | | | |
Collapse
|
15
|
Diop A, Raoult D, Fournier PE. Paradoxical evolution of rickettsial genomes. Ticks Tick Borne Dis 2018; 10:462-469. [PMID: 30448253 DOI: 10.1016/j.ttbdis.2018.11.007] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2018] [Revised: 08/08/2018] [Accepted: 11/09/2018] [Indexed: 01/08/2023]
Abstract
Rickettsia species are strictly intracellular bacteria that evolved approximately 150 million years ago from a presumably free-living common ancestor from the order Rickettsiales that followed a transition to an obligate intracellular lifestyle. Rickettsiae are best known as human pathogens vectored by various arthropods causing a range of mild to severe human diseases. As part of their obligate intracellular lifestyle, rickettsial genomes have undergone a convergent evolution that includes a strong genomic reduction resulting from progressive gene degradation, genomic rearrangements as well as a paradoxical expansion of various genetic elements, notably small RNAs and short palindromic elements whose role remains unknown. This reductive evolutionary process is not unique to members of the Rickettsia genus but is common to several human pathogenic bacteria. Gene loss, gene duplication, DNA repeat duplication and horizontal gene transfer all have shaped rickettsial genome evolution. Gene loss mostly involved amino-acid, ATP, LPS and cell wall component biosynthesis and transcriptional regulators, but with a high preservation of toxin-antitoxin (TA) modules, recombination and DNA repair proteins. Surprisingly the most virulent Rickettsia species were shown to have the most drastically reduced and degraded genomes compared to closely related species of milder pathogenesis. In contrast, the less pathogenic species harbored the greatest number of mobile genetic elements. Thus, this distinct evolutionary process observed in Rickettsia species may be correlated with the differences in virulence and pathogenicity observed in these obligate intracellular bacteria. However, future investigations are needed to provide novel insights into the evolution of genome sizes and content, for that a better understanding of the balance between proliferation and elimination of genetic material in these intracellular bacteria is required.
Collapse
Affiliation(s)
- Awa Diop
- UMR VITROME, Aix-Marseille University, IRD, Service de Santé des Armées, Assistance Publique-Hôpitaux de Marseille, Institut Hospitalo-Uuniversitaire Méditerranée Infection, 19-21 Boulevard Jean Moulin, 13005, Marseille, France
| | - Didier Raoult
- UMR MEPHI, Aix-Marseille University, IRD, Assistance Publique-Hôpitaux de Marseille, Institut Hospitalo-Uuniversitaire Méditerranée Infection, Marseille, France
| | - Pierre-Edouard Fournier
- UMR VITROME, Aix-Marseille University, IRD, Service de Santé des Armées, Assistance Publique-Hôpitaux de Marseille, Institut Hospitalo-Uuniversitaire Méditerranée Infection, 19-21 Boulevard Jean Moulin, 13005, Marseille, France.
| |
Collapse
|
16
|
Krogh TJ, Møller-Jensen J, Kaleta C. Impact of Chromosomal Architecture on the Function and Evolution of Bacterial Genomes. Front Microbiol 2018; 9:2019. [PMID: 30210483 PMCID: PMC6119826 DOI: 10.3389/fmicb.2018.02019] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2018] [Accepted: 08/09/2018] [Indexed: 12/14/2022] Open
Abstract
The bacterial nucleoid is highly condensed and forms compartment-like structures within the cell. Much attention has been devoted to investigating the dynamic topology and organization of the nucleoid. In contrast, the specific nucleoid organization, and the relationship between nucleoid structure and function is often neglected with regard to importance for adaption to changing environments and horizontal gene acquisition. In this review, we focus on the structure-function relationship in the bacterial nucleoid. We provide an overview of the fundamental properties that shape the chromosome as a structured yet dynamic macromolecule. These fundamental properties are then considered in the context of the living cell, with focus on how the informational flow affects the nucleoid structure, which in turn impacts on the genetic output. Subsequently, the dynamic living nucleoid will be discussed in the context of evolution. We will address how the acquisition of foreign DNA impacts nucleoid structure, and conversely, how nucleoid structure constrains the successful and sustainable chromosomal integration of novel DNA. Finally, we will discuss current challenges and directions of research in understanding the role of chromosomal architecture in bacterial survival and adaptation.
Collapse
Affiliation(s)
- Thøger J Krogh
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, Odense, Denmark
| | - Jakob Møller-Jensen
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, Odense, Denmark
| | - Christoph Kaleta
- Institute of Experimental Medicine, Christian-Albrechts-University Kiel, Kiel, Germany
| |
Collapse
|
17
|
McInerney JO, McNally A, O’Connell MJ. Reply to ‘The population genetics of pangenomes’. Nat Microbiol 2017; 2:1575. [DOI: 10.1038/s41564-017-0068-4] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
|
18
|
Halter W, Montenbruck JM, Tuza ZA, Allgöwer F. A resource dependent protein synthesis model for evaluating synthetic circuits. J Theor Biol 2017; 420:267-278. [PMID: 28286216 DOI: 10.1016/j.jtbi.2017.03.004] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2016] [Revised: 02/06/2017] [Accepted: 03/07/2017] [Indexed: 11/26/2022]
Abstract
Reliable in silico design of synthetic gene networks necessitates novel approaches to model the process of protein synthesis under the influence of limited resources. We present such a novel protein synthesis model which originates from the Ribosome Flow Model and among other things describes the movement of RNA-polymerase and ribosomes on mRNA and DNA templates, respectively. By analyzing the convergence properties of this model based upon geometric considerations, we present additional insights into the dynamic mechanisms of the process of protein synthesis. Further, we demonstrate how this model can be used to evaluate the performance of synthetic gene circuits under different loading scenarios.
Collapse
Affiliation(s)
- Wolfgang Halter
- Institute for Systems Theory and Automatic Control, University of Stuttgart, Pfaffenwaldring 9, Stuttgart, Germany.
| | - Jan Maximilian Montenbruck
- Institute for Systems Theory and Automatic Control, University of Stuttgart, Pfaffenwaldring 9, Stuttgart, Germany
| | - Zoltan A Tuza
- Institute for Systems Theory and Automatic Control, University of Stuttgart, Pfaffenwaldring 9, Stuttgart, Germany
| | - Frank Allgöwer
- Institute for Systems Theory and Automatic Control, University of Stuttgart, Pfaffenwaldring 9, Stuttgart, Germany
| |
Collapse
|
19
|
Błażej P, Mackiewicz D, Grabińska M, Wnętrzak M, Mackiewicz P. Optimization of amino acid replacement costs by mutational pressure in bacterial genomes. Sci Rep 2017; 7:1061. [PMID: 28432324 PMCID: PMC5430830 DOI: 10.1038/s41598-017-01130-7] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2016] [Accepted: 03/27/2017] [Indexed: 12/17/2022] Open
Abstract
Mutations are considered a spontaneous and random process, which is important component of evolution because it generates genetic variation. On the other hand, mutations are deleterious leading to non-functional genes and energetically costly repairs. Therefore, one can expect that the mutational pressure is optimized to simultaneously generate genetic diversity and preserve genetic information. To check if empirical mutational pressures are optimized in these ways, we compared matrices of nucleotide mutation rates derived from bacterial genomes with their best possible alternatives that minimized or maximized costs of amino acid replacements associated with differences in their physicochemical properties (e.g. hydropathy and polarity). It should be noted that the studied empirical nucleotide substitution matrices and the costs of amino acid replacements are independent because these matrices were derived from sites free of selection on amino acid properties and the amino acid costs assumed only amino acid physicochemical properties without any information about mutation at the nucleotide level. Obtained results indicate that the empirical mutational matrices show a tendency to minimize costs of amino acid replacements. It implies that bacterial mutational pressures can evolve to decrease consequences of amino acid substitutions. However, the optimization is not full, which enables generation of some genetic variability.
Collapse
Affiliation(s)
- Paweł Błażej
- Department of Genomics, Faculty of Biotechnology, University of Wrocław, ul. Joliot-Curie 14a, 50-383, Wrocław, Poland
| | - Dorota Mackiewicz
- Department of Genomics, Faculty of Biotechnology, University of Wrocław, ul. Joliot-Curie 14a, 50-383, Wrocław, Poland
| | - Małgorzata Grabińska
- Department of Genomics, Faculty of Biotechnology, University of Wrocław, ul. Joliot-Curie 14a, 50-383, Wrocław, Poland
| | - Małgorzata Wnętrzak
- Department of Genomics, Faculty of Biotechnology, University of Wrocław, ul. Joliot-Curie 14a, 50-383, Wrocław, Poland
| | - Paweł Mackiewicz
- Department of Genomics, Faculty of Biotechnology, University of Wrocław, ul. Joliot-Curie 14a, 50-383, Wrocław, Poland.
| |
Collapse
|
20
|
Zhu L, Zhong J, Jia X, Liu G, Kang Y, Dong M, Zhang X, Li Q, Yue L, Li C, Fu J, Xiao J, Yan J, Zhang B, Lei M, Chen S, Lv L, Zhu B, Huang H, Chen F. Precision methylome characterization of Mycobacterium tuberculosis complex (MTBC) using PacBio single-molecule real-time (SMRT) technology. Nucleic Acids Res 2016; 44:730-43. [PMID: 26704977 PMCID: PMC4737169 DOI: 10.1093/nar/gkv1498] [Citation(s) in RCA: 74] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2015] [Revised: 12/10/2015] [Accepted: 12/11/2015] [Indexed: 01/08/2023] Open
Abstract
Tuberculosis (TB) remains one of the most common infectious diseases caused by Mycobacterium tuberculosis complex (MTBC). To panoramically analyze MTBC's genomic methylation, we completed the genomes of 12 MTBC strains (Mycobacterium bovis; M. bovis BCG; M. microti; M. africanum; M. tuberculosis H37Rv; H37Ra; and 6 M. tuberculosis clinical isolates) belonging to different lineages and characterized their methylomes using single-molecule real-time (SMRT) technology. We identified three (m6)A sequence motifs and their corresponding methyltransferase (MTase) genes, including the reported mamA, hsdM and a newly discovered mamB. We also experimentally verified the methylated motifs and functions of HsdM and MamB. Our analysis indicated the MTase activities varied between 12 strains due to mutations/deletions. Furthermore, through measuring 'the methylated-motif-site ratio' and 'the methylated-read ratio', we explored the methylation status of each modified site and sequence-read to obtain the 'precision methylome' of the MTBC strains, which enabled intricate analysis of MTase activity at whole-genome scale. Most unmodified sites overlapped with transcription-factor binding-regions, which might protect these sites from methylation. Overall, our findings show enormous potential for the SMRT platform to investigate the precise character of methylome, and significantly enhance our understanding of the function of DNA MTase.
Collapse
Affiliation(s)
- Lingxiang Zhu
- CAS Key Laboratory of Genome Sciences & Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China National Research Institute for Family Planning, Beijing 100081, China
| | - Jun Zhong
- CAS Key Laboratory of Genome Sciences & Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| | - Xinmiao Jia
- CAS Key Laboratory of Genome Sciences & Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China University of Chinese Academy of Sciences, Beijing 100049, China
| | - Guan Liu
- National Clinical Laboratory on Tuberculosis, Beijing Key Laboratory on Drug-resistant Tuberculosis Research, Beijing Chest Hospital, Capital Medical University, Beijing Tuberculosis and Thoracic Tumor Institute, Beijing 101149, China
| | - Yu Kang
- CAS Key Laboratory of Genome Sciences & Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| | - Mengxing Dong
- CAS Key Laboratory of Genome Sciences & Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China University of Chinese Academy of Sciences, Beijing 100049, China
| | - Xiuli Zhang
- CAS Key Laboratory of Genome Sciences & Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China University of Chinese Academy of Sciences, Beijing 100049, China
| | - Qian Li
- CAS Key Laboratory of Genome Sciences & Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China University of Chinese Academy of Sciences, Beijing 100049, China
| | - Liya Yue
- CAS Key Laboratory of Genome Sciences & Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China University of Chinese Academy of Sciences, Beijing 100049, China
| | - Cuidan Li
- CAS Key Laboratory of Genome Sciences & Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China University of Chinese Academy of Sciences, Beijing 100049, China
| | - Jing Fu
- CAS Key Laboratory of Genome Sciences & Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China University of Chinese Academy of Sciences, Beijing 100049, China
| | - Jingfa Xiao
- CAS Key Laboratory of Genome Sciences & Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| | - Jiangwei Yan
- CAS Key Laboratory of Genome Sciences & Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| | - Bing Zhang
- Core Genomic Facility, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| | - Meng Lei
- Core Genomic Facility, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| | - Suting Chen
- National Clinical Laboratory on Tuberculosis, Beijing Key Laboratory on Drug-resistant Tuberculosis Research, Beijing Chest Hospital, Capital Medical University, Beijing Tuberculosis and Thoracic Tumor Institute, Beijing 101149, China
| | - Lingna Lv
- National Clinical Laboratory on Tuberculosis, Beijing Key Laboratory on Drug-resistant Tuberculosis Research, Beijing Chest Hospital, Capital Medical University, Beijing Tuberculosis and Thoracic Tumor Institute, Beijing 101149, China
| | - Baoli Zhu
- CAS Key Laboratory of Pathogenic Microbiology & Immunology, Institute Of Microbiology, Chinese Academy of Sciences, Beijing 100101, China
| | - Hairong Huang
- National Clinical Laboratory on Tuberculosis, Beijing Key Laboratory on Drug-resistant Tuberculosis Research, Beijing Chest Hospital, Capital Medical University, Beijing Tuberculosis and Thoracic Tumor Institute, Beijing 101149, China
| | - Fei Chen
- CAS Key Laboratory of Genome Sciences & Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China Collaborative Innovation Center for Genetics and Development, China
| |
Collapse
|
21
|
Peabody MA, Van Rossum T, Lo R, Brinkman FSL. Evaluation of shotgun metagenomics sequence classification methods using in silico and in vitro simulated communities. BMC Bioinformatics 2015; 16:363. [PMID: 26537885 PMCID: PMC4634789 DOI: 10.1186/s12859-015-0788-5] [Citation(s) in RCA: 96] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2015] [Accepted: 10/20/2015] [Indexed: 01/14/2023] Open
Abstract
Background The field of metagenomics (study of genetic material recovered directly from an environment) has grown rapidly, with many bioinformatics analysis methods being developed. To ensure appropriate use of such methods, robust comparative evaluation of their accuracy and features is needed. For taxonomic classification of sequence reads, such evaluation should include use of clade exclusion, which better evaluates a method’s accuracy when identical sequences are not present in any reference database, as is common in metagenomic analysis. To date, relatively small evaluations have been performed, with evaluation approaches like clade exclusion limited to assessment of new methods by the authors of the given method. What is needed is a rigorous, independent comparison between multiple major methods, using the same in silico and in vitro test datasets, with and without approaches like clade exclusion, to better characterize accuracy under different conditions. Results An overview of the features of 38 bioinformatics methods is provided, evaluating accuracy with a focus on 11 programs that have reference databases that can be modified and therefore most robustly evaluated with clade exclusion. Taxonomic classification of sequence reads was evaluated using both in silico and in vitro mock bacterial communities. Clade exclusion was used at taxonomic levels from species to class—identifying how well methods perform in progressively more difficult scenarios. A wide range of variability was found in the sensitivity, precision, overall accuracy, and computational demand for the programs evaluated. In experiments where distilled water was spiked with only 11 bacterial species, frequently dozens to hundreds of species were falsely predicted by the most popular programs. The different features of each method (forces predictions or not, etc.) are summarized, and additional analysis considerations discussed. Conclusions The accuracy of shotgun metagenomics classification methods varies widely. No one program clearly outperformed others in all evaluation scenarios; rather, the results illustrate the strengths of different methods for different purposes. Researchers must appreciate method differences, choosing the program best suited for their particular analysis to avoid very misleading results. Use of standardized datasets for method comparisons is encouraged, as is use of mock microbial community controls suitable for a particular metagenomic analysis. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0788-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Michael A Peabody
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC, Canada.
| | - Thea Van Rossum
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC, Canada.
| | - Raymond Lo
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC, Canada.
| | - Fiona S L Brinkman
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC, Canada.
| |
Collapse
|
22
|
Błażej P, Miasojedow B, Grabińska M, Mackiewicz P. Optimization of Mutation Pressure in Relation to Properties of Protein-Coding Sequences in Bacterial Genomes. PLoS One 2015; 10:e0130411. [PMID: 26121655 PMCID: PMC4488281 DOI: 10.1371/journal.pone.0130411] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2014] [Accepted: 05/19/2015] [Indexed: 12/22/2022] Open
Abstract
Most mutations are deleterious and require energetically costly repairs. Therefore, it seems that any minimization of mutation rate is beneficial. On the other hand, mutations generate genetic diversity indispensable for evolution and adaptation of organisms to changing environmental conditions. Thus, it is expected that a spontaneous mutational pressure should be an optimal compromise between these two extremes. In order to study the optimization of the pressure, we compared mutational transition probability matrices from bacterial genomes with artificial matrices fulfilling the same general features as the real ones, e.g., the stationary distribution and the speed of convergence to the stationarity. The artificial matrices were optimized on real protein-coding sequences based on Evolutionary Strategies approach to minimize or maximize the probability of non-synonymous substitutions and costs of amino acid replacements depending on their physicochemical properties. The results show that the empirical matrices have a tendency to minimize the effects of mutations rather than maximize their costs on the amino acid level. They were also similar to the optimized artificial matrices in the nucleotide substitution pattern, especially the high transitions/transversions ratio. We observed no substantial differences between the effects of mutational matrices on protein-coding sequences in genomes under study in respect of differently replicated DNA strands, mutational cost types and properties of the referenced artificial matrices. The findings indicate that the empirical mutational matrices are rather adapted to minimize mutational costs in the studied organisms in comparison to other matrices with similar mathematical constraints.
Collapse
Affiliation(s)
- Paweł Błażej
- Department of Genomics, Faculty of Biotechnology, University of Wrocław, Wrocław, Poland
| | - Błażej Miasojedow
- Section of Mathematical Statistics, The Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warszawa, Poland
| | - Małgorzata Grabińska
- Department of Genomics, Faculty of Biotechnology, University of Wrocław, Wrocław, Poland
| | - Paweł Mackiewicz
- Department of Genomics, Faculty of Biotechnology, University of Wrocław, Wrocław, Poland
- * E-mail:
| |
Collapse
|
23
|
Abstract
The concept of the minimal cell has fascinated scientists for a long time, from both fundamental and applied points of view. This broad concept encompasses extreme reductions of genomes, the last universal common ancestor (LUCA), the creation of semiartificial cells, and the design of protocells and chassis cells. Here we review these different areas of research and identify common and complementary aspects of each one. We focus on systems biology, a discipline that is greatly facilitating the classical top-down and bottom-up approaches toward minimal cells. In addition, we also review the so-called middle-out approach and its contributions to the field with mathematical and computational models. Owing to the advances in genomics technologies, much of the work in this area has been centered on minimal genomes, or rather minimal gene sets, required to sustain life. Nevertheless, a fundamental expansion has been taking place in the last few years wherein the minimal gene set is viewed as a backbone of a more complex system. Complementing genomics, progress is being made in understanding the system-wide properties at the levels of the transcriptome, proteome, and metabolome. Network modeling approaches are enabling the integration of these different omics data sets toward an understanding of the complex molecular pathways connecting genotype to phenotype. We review key concepts central to the mapping and modeling of this complexity, which is at the heart of research on minimal cells. Finally, we discuss the distinction between minimizing the number of cellular components and minimizing cellular complexity, toward an improved understanding and utilization of minimal and simpler cells.
Collapse
|
24
|
Transcriptome dynamics-based operon prediction in prokaryotes. BMC Bioinformatics 2014; 15:145. [PMID: 24884724 PMCID: PMC4235196 DOI: 10.1186/1471-2105-15-145] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2013] [Accepted: 04/22/2014] [Indexed: 11/21/2022] Open
Abstract
Background Inferring operon maps is crucial to understanding the regulatory networks of prokaryotic genomes. Recently, RNA-seq based transcriptome studies revealed that in many bacterial species the operon structure vary with the change of environmental conditions. Therefore, new computational solutions that use both static and dynamic data are necessary to create condition specific operon predictions. Results In this work, we propose a novel classification method that integrates RNA-seq based transcriptome profiles with genomic sequence features to accurately identify the operons that are expressed under a measured condition. The classifiers are trained on a small set of confirmed operons and then used to classify the remaining gene pairs of the organism studied. Finally, by linking consecutive gene pairs classified as operons, our computational approach produces condition-dependent operon maps. We evaluated our approach on various RNA-seq expression profiles of the bacteria Haemophilus somni, Porphyromonas gingivalis, Escherichia coli and Salmonella enterica. Our results demonstrate that, using features depending on both transcriptome dynamics and genome sequence characteristics, we can identify operon pairs with high accuracy. Moreover, the combination of DNA sequence and expression data results in more accurate predictions than each one alone. Conclusion We present a computational strategy for the comprehensive analysis of condition-dependent operon maps in prokaryotes. Our method can be used to generate condition specific operon maps of many bacterial organisms for which high-resolution transcriptome data is available.
Collapse
|
25
|
Hilker R, Stadermann KB, Doppmeier D, Kalinowski J, Stoye J, Straube J, Winnebald J, Goesmann A. ReadXplorer--visualization and analysis of mapped sequences. Bioinformatics 2014; 30:2247-54. [PMID: 24790157 PMCID: PMC4217279 DOI: 10.1093/bioinformatics/btu205] [Citation(s) in RCA: 92] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
MOTIVATION Fast algorithms and well-arranged visualizations are required for the comprehensive analysis of the ever-growing size of genomic and transcriptomic next-generation sequencing data. RESULTS ReadXplorer is a software offering straightforward visualization and extensive analysis functions for genomic and transcriptomic DNA sequences mapped on a reference. A unique specialty of ReadXplorer is the quality classification of the read mappings. It is incorporated in all analysis functions and displayed in ReadXplorer's various synchronized data viewers for (i) the reference sequence, its base coverage as (ii) normalizable plot and (iii) histogram, (iv) read alignments and (v) read pairs. ReadXplorer's analysis capability covers RNA secondary structure prediction, single nucleotide polymorphism and deletion-insertion polymorphism detection, genomic feature and general coverage analysis. Especially for RNA-Seq data, it offers differential gene expression analysis, transcription start site and operon detection as well as RPKM value and read count calculations. Furthermore, ReadXplorer can combine or superimpose coverage of different datasets. AVAILABILITY AND IMPLEMENTATION ReadXplorer is available as open-source software at http://www.readxplorer.org along with a detailed manual.
Collapse
Affiliation(s)
- Rolf Hilker
- Institute of Medical Microbiology, Justus-Liebig-University, 35392 Giessen, Germany, Faculty of Biology, Institute for Bioinformatics, Center for Biotechnology, Computational Genomics, Center for Biotechnology, Technology Platform Genomics, Center for Biotechnology, Genome Informatics, Faculty of Technology, Bielefeld University, 33615 Bielefeld, Germany and Bioinformatics and Systems Biology, Faculty of Biology and Chemistry, Justus-Liebig-University, 35392 Giessen, Germany
| | - Kai Bernd Stadermann
- Institute of Medical Microbiology, Justus-Liebig-University, 35392 Giessen, Germany, Faculty of Biology, Institute for Bioinformatics, Center for Biotechnology, Computational Genomics, Center for Biotechnology, Technology Platform Genomics, Center for Biotechnology, Genome Informatics, Faculty of Technology, Bielefeld University, 33615 Bielefeld, Germany and Bioinformatics and Systems Biology, Faculty of Biology and Chemistry, Justus-Liebig-University, 35392 Giessen, GermanyInstitute of Medical Microbiology, Justus-Liebig-University, 35392 Giessen, Germany, Faculty of Biology, Institute for Bioinformatics, Center for Biotechnology, Computational Genomics, Center for Biotechnology, Technology Platform Genomics, Center for Biotechnology, Genome Informatics, Faculty of Technology, Bielefeld University, 33615 Bielefeld, Germany and Bioinformatics and Systems Biology, Faculty of Biology and Chemistry, Justus-Liebig-University, 35392 Giessen, Germany
| | - Daniel Doppmeier
- Institute of Medical Microbiology, Justus-Liebig-University, 35392 Giessen, Germany, Faculty of Biology, Institute for Bioinformatics, Center for Biotechnology, Computational Genomics, Center for Biotechnology, Technology Platform Genomics, Center for Biotechnology, Genome Informatics, Faculty of Technology, Bielefeld University, 33615 Bielefeld, Germany and Bioinformatics and Systems Biology, Faculty of Biology and Chemistry, Justus-Liebig-University, 35392 Giessen, Germany
| | - Jörn Kalinowski
- Institute of Medical Microbiology, Justus-Liebig-University, 35392 Giessen, Germany, Faculty of Biology, Institute for Bioinformatics, Center for Biotechnology, Computational Genomics, Center for Biotechnology, Technology Platform Genomics, Center for Biotechnology, Genome Informatics, Faculty of Technology, Bielefeld University, 33615 Bielefeld, Germany and Bioinformatics and Systems Biology, Faculty of Biology and Chemistry, Justus-Liebig-University, 35392 Giessen, Germany
| | - Jens Stoye
- Institute of Medical Microbiology, Justus-Liebig-University, 35392 Giessen, Germany, Faculty of Biology, Institute for Bioinformatics, Center for Biotechnology, Computational Genomics, Center for Biotechnology, Technology Platform Genomics, Center for Biotechnology, Genome Informatics, Faculty of Technology, Bielefeld University, 33615 Bielefeld, Germany and Bioinformatics and Systems Biology, Faculty of Biology and Chemistry, Justus-Liebig-University, 35392 Giessen, GermanyInstitute of Medical Microbiology, Justus-Liebig-University, 35392 Giessen, Germany, Faculty of Biology, Institute for Bioinformatics, Center for Biotechnology, Computational Genomics, Center for Biotechnology, Technology Platform Genomics, Center for Biotechnology, Genome Informatics, Faculty of Technology, Bielefeld University, 33615 Bielefeld, Germany and Bioinformatics and Systems Biology, Faculty of Biology and Chemistry, Justus-Liebig-University, 35392 Giessen, Germany
| | - Jasmin Straube
- Institute of Medical Microbiology, Justus-Liebig-University, 35392 Giessen, Germany, Faculty of Biology, Institute for Bioinformatics, Center for Biotechnology, Computational Genomics, Center for Biotechnology, Technology Platform Genomics, Center for Biotechnology, Genome Informatics, Faculty of Technology, Bielefeld University, 33615 Bielefeld, Germany and Bioinformatics and Systems Biology, Faculty of Biology and Chemistry, Justus-Liebig-University, 35392 Giessen, Germany
| | - Jörn Winnebald
- Institute of Medical Microbiology, Justus-Liebig-University, 35392 Giessen, Germany, Faculty of Biology, Institute for Bioinformatics, Center for Biotechnology, Computational Genomics, Center for Biotechnology, Technology Platform Genomics, Center for Biotechnology, Genome Informatics, Faculty of Technology, Bielefeld University, 33615 Bielefeld, Germany and Bioinformatics and Systems Biology, Faculty of Biology and Chemistry, Justus-Liebig-University, 35392 Giessen, Germany
| | - Alexander Goesmann
- Institute of Medical Microbiology, Justus-Liebig-University, 35392 Giessen, Germany, Faculty of Biology, Institute for Bioinformatics, Center for Biotechnology, Computational Genomics, Center for Biotechnology, Technology Platform Genomics, Center for Biotechnology, Genome Informatics, Faculty of Technology, Bielefeld University, 33615 Bielefeld, Germany and Bioinformatics and Systems Biology, Faculty of Biology and Chemistry, Justus-Liebig-University, 35392 Giessen, Germany
| |
Collapse
|
26
|
Ottesen AR, Gonzalez A, Bell R, Arce C, Rideout S, Allard M, Evans P, Strain E, Musser S, Knight R, Brown E, Pettengill JB. Co-enriching microflora associated with culture based methods to detect Salmonella from tomato phyllosphere. PLoS One 2013; 8:e73079. [PMID: 24039862 PMCID: PMC3767688 DOI: 10.1371/journal.pone.0073079] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2013] [Accepted: 07/16/2013] [Indexed: 11/19/2022] Open
Abstract
The ability to detect a specific organism from a complex environment is vitally important to many fields of public health, including food safety. For example, tomatoes have been implicated numerous times as vehicles of foodborne outbreaks due to strains of Salmonella but few studies have ever recovered Salmonella from a tomato phyllosphere environment. Precision of culturing techniques that target agents associated with outbreaks depend on numerous factors. One important factor to better understand is which species co-enrich during enrichment procedures and how microbial dynamics may impede or enhance detection of target pathogens. We used a shotgun sequence approach to describe taxa associated with samples pre-enrichment and throughout the enrichment steps of the Bacteriological Analytical Manual's (BAM) protocol for detection of Salmonella from environmental tomato samples. Recent work has shown that during efforts to enrich Salmonella (Proteobacteria) from tomato field samples, Firmicute genera are also co-enriched and at least one co-enriching Firmicute genus (Paenibacillus sp.) can inhibit and even kills strains of Salmonella. Here we provide a baseline description of microflora that co-culture during detection efforts and the utility of a bioinformatic approach to detect specific taxa from metagenomic sequence data. We observed that uncultured samples clustered together with distinct taxonomic profiles relative to the three cultured treatments (Universal Pre-enrichment broth (UPB), Tetrathionate (TT), and Rappaport-Vassiliadis (RV)). There was little consistency among samples exposed to the same culturing medias, suggesting significant microbial differences in starting matrices or stochasticity associated with enrichment processes. Interestingly, Paenibacillus sp. (Salmonella inhibitor) was significantly enriched from uncultured to cultured (UPB) samples. Also of interest was the sequence based identification of a number of sequences as Salmonella despite indication by all media, that samples were culture negative for Salmonella. Our results substantiate the nascent utility of metagenomic methods to improve both biological and bioinformatic pathogen detection methods.
Collapse
Affiliation(s)
- Andrea R. Ottesen
- Molecular Methods and Subtyping Branch, Division of Microbiology, Office of Regulatory Science, Center for Food Safety and Applied Nutrition, United States Food and Drug Administration, College Park, Maryland, United States of America
| | - Antonio Gonzalez
- Biofrontiers Institute, University of Colorado, Boulder, Colorado, United States of America
- Biofrontiers Institute, University of Colorado, Boulder, Colorado, United States of America
| | - Rebecca Bell
- Molecular Methods and Subtyping Branch, Division of Microbiology, Office of Regulatory Science, Center for Food Safety and Applied Nutrition, United States Food and Drug Administration, College Park, Maryland, United States of America
| | - Caroline Arce
- Molecular Methods and Subtyping Branch, Division of Microbiology, Office of Regulatory Science, Center for Food Safety and Applied Nutrition, United States Food and Drug Administration, College Park, Maryland, United States of America
| | - Steven Rideout
- Virginia Tech, Virginia Agricultural Experiment Station, Painter, Virginia, United States of America
| | - Marc Allard
- Molecular Methods and Subtyping Branch, Division of Microbiology, Office of Regulatory Science, Center for Food Safety and Applied Nutrition, United States Food and Drug Administration, College Park, Maryland, United States of America
| | - Peter Evans
- Molecular Methods and Subtyping Branch, Division of Microbiology, Office of Regulatory Science, Center for Food Safety and Applied Nutrition, United States Food and Drug Administration, College Park, Maryland, United States of America
| | - Errol Strain
- Molecular Methods and Subtyping Branch, Division of Microbiology, Office of Regulatory Science, Center for Food Safety and Applied Nutrition, United States Food and Drug Administration, College Park, Maryland, United States of America
| | - Steven Musser
- Molecular Methods and Subtyping Branch, Division of Microbiology, Office of Regulatory Science, Center for Food Safety and Applied Nutrition, United States Food and Drug Administration, College Park, Maryland, United States of America
| | - Rob Knight
- Department of Chemistry and Biochemistry, University of Colorado, Boulder, Colorado, United States of America
- Howard Hughes Medical Institute, Boulder, Colorado, United States of America
| | - Eric Brown
- Molecular Methods and Subtyping Branch, Division of Microbiology, Office of Regulatory Science, Center for Food Safety and Applied Nutrition, United States Food and Drug Administration, College Park, Maryland, United States of America
| | - James B. Pettengill
- Molecular Methods and Subtyping Branch, Division of Microbiology, Office of Regulatory Science, Center for Food Safety and Applied Nutrition, United States Food and Drug Administration, College Park, Maryland, United States of America
| |
Collapse
|
27
|
Abstract
Environment-dependent genomic features have been defined for different metagenomes, whose genes and their associated processes are related to specific environments. Identification of ORFs and their functional categories are the most common methods for association between functional and environmental features. However, this analysis based on finding ORFs misses noncoding sequences and, therefore, some metagenome regulatory or structural information could be discarded. In this work we analyzed 23 whole metagenomes, including coding and noncoding sequences using the following sequence patterns: (G+C) content, Codon Usage (Cd), Trinucleotide Usage (Tn), and functional assignments for ORF prediction. Herein, we present evidence of a high proportion of noncoding sequences discarded in common similarity-based methods in metagenomics, and the kind of relevant information present in those. We found a high density of trinucleotide repeat sequences (TRS) in noncoding sequences, with a regulatory and adaptive function for metagenome communities. We present associations between trinucleotide values and gene function, where metagenome clustering correlate with microorganism adaptations and kinds of metagenomes. We propose here that noncoding sequences have relevant information to describe metagenomes that could be considered in a whole metagenome analysis in order to improve their organization, classification protocols, and their relation with the environment.
Collapse
|
28
|
Tsoy OV, Pyatnitskiy MA, Kazanov MD, Gelfand MS. Evolution of transcriptional regulation in closely related bacteria. BMC Evol Biol 2012; 12:200. [PMID: 23039862 PMCID: PMC3735044 DOI: 10.1186/1471-2148-12-200] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2012] [Accepted: 09/26/2012] [Indexed: 01/15/2023] Open
Abstract
BACKGROUND The exponential growth of the number of fully sequenced genomes at varying taxonomic closeness allows one to characterize transcriptional regulation using comparative-genomics analysis instead of time-consuming experimental methods. A transcriptional regulatory unit consists of a transcription factor, its binding site and a regulated gene. These units constitute a graph which contains so-called "network motifs", subgraphs of a given structure. Here we consider genomes of closely related Enterobacteriales and estimate the fraction of conserved network motifs and sites as well as positions under selection in various types of non-coding regions. RESULTS Using a newly developed technique, we found that the highest fraction of positions under selection, approximately 50%, was observed in synvergon spacers (between consecutive genes from the same strand), followed by ~45% in divergon spacers (common 5'-regions), and ~10% in convergon spacers (common 3'-regions). The fraction of selected positions in functional regions was higher, 60% in transcription factor-binding sites and ~45% in terminators and promoters. Small, but significant differences were observed between Escherichia coli and Salmonella enterica. This fraction is similar to the one observed in eukaryotes.The conservation of binding sites demonstrated some differences between types of regulatory units. In E. coli, strains the interactions of the type "local transcriptional factor gene" turned out to be more conserved in feed-forward loops (FFLs) compared to non-motif interactions. The coherent FFLs tend to be less conserved than the incoherent FFLs. A natural explanation is that the former imply functional redundancy. CONCLUSIONS A naïve hypothesis that FFL would be highly conserved turned out to be not entirely true: its conservation depends on its status in the transcriptional network and also from its usage. The fraction of positions under selection in intergenic regions of bacterial genomes is roughly similar to that of eukaryotes. Known regulatory sites explain 20±5% of selected positions.
Collapse
Affiliation(s)
- Olga V Tsoy
- Institute for Information Transmission Problems, RAS, Bolshoi Karetny per. 19, Moscow 127994, Russia
| | | | | | | |
Collapse
|
29
|
A global analysis of adaptive evolution of operons in cyanobacteria. Antonie van Leeuwenhoek 2012; 103:331-46. [PMID: 22987250 DOI: 10.1007/s10482-012-9813-0] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/31/2012] [Accepted: 09/06/2012] [Indexed: 01/04/2023]
Abstract
Operons are an important feature of prokaryotic genomes. Evolution of operons is hypothesized to be adaptive and has contributed significantly towards coordinated optimization of functions. Two conflicting theories, based on (i) in situ formation to achieve co-regulation and (ii) horizontal gene transfer of functionally linked gene clusters, are generally considered to explain why and how operons have evolved. Furthermore, effects of operon evolution on genomic traits such as intergenic spacing, operon size and co-regulation are relatively less explored. Based on the conservation level in a set of diverse prokaryotes, we categorize the operonic gene pair associations and in turn the operons as ancient and recently formed. This allowed us to perform a detailed analysis of operonic structure in cyanobacteria, a morphologically and physiologically diverse group of photoautotrophs. Clustering based on operon conservation showed significant similarity with the 16S rRNA-based phylogeny, which groups the cyanobacterial strains into three clades. Clade C, dominated by strains that are believed to have undergone genome reduction, shows a larger fraction of operonic genes that are tightly packed in larger sized operons. Ancient operons are in general larger, more tightly packed, better optimized for co-regulation and part of key cellular processes. A sub-clade within Clade B, which includes Synechocystis sp. PCC 6803, shows a reverse trend in intergenic spacing. Our results suggest that while in situ formation and vertical descent may be a dominant mechanism of operon evolution in cyanobacteria, optimization of intergenic spacing and co-regulation are part of an ongoing process in the life-cycle of operons.
Collapse
|
30
|
Lashin SA, Matushkin YG, Suslov VV, Kolchanov NA. Evolutionary trends in the prokaryotic community and prokaryotic community-phage systems. RUSS J GENET+ 2011. [DOI: 10.1134/s1022795411110123] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
31
|
Operon Prediction Based On an Iterative Self-learning Algorithm*. PROG BIOCHEM BIOPHYS 2011. [DOI: 10.3724/sp.j.1206.2010.00686] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
32
|
Rangannan V, Bansal M. PromBase: a web resource for various genomic features and predicted promoters in prokaryotic genomes. BMC Res Notes 2011; 4:257. [PMID: 21781326 PMCID: PMC3160392 DOI: 10.1186/1756-0500-4-257] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2011] [Accepted: 07/22/2011] [Indexed: 12/19/2022] Open
Abstract
Background As more and more genomes are being sequenced, an overview of their genomic features and annotation of their functional elements, which control the expression of each gene or transcription unit of the genome, is a fundamental challenge in genomics and bioinformatics. Findings Relative stability of DNA sequence has been used to predict promoter regions in 913 microbial genomic sequences with GC-content ranging from 16.6% to 74.9%. Irrespective of the genome GC-content the relative stability based promoter prediction method has already been proven to be robust in terms of recall and precision. The predicted promoter regions for the 913 microbial genomes have been accumulated in a database called PromBase. Promoter search can be carried out in PromBase either by specifying the gene name or the genomic position. Each predicted promoter region has been assigned to a reliability class (low, medium, high, very high and highest) based on the difference between its average free energy and the downstream region. The recall and precision values for each class are shown graphically in PromBase. In addition, PromBase provides detailed information about base composition, CDS and CG/TA skews for each genome and various DNA sequence dependent structural properties (average free energy, curvature and bendability) in the vicinity of all annotated translation start sites (TLS). Conclusion PromBase is a database, which contains predicted promoter regions and detailed analysis of various genomic features for 913 microbial genomes. PromBase can serve as a valuable resource for comparative genomics study and help the experimentalist to rapidly access detailed information on various genomic features and putative promoter regions in any given genome. This database is freely accessible for academic and non- academic users via the worldwide web http://nucleix.mbu.iisc.ernet.in/prombase/.
Collapse
Affiliation(s)
- Vetriselvi Rangannan
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore-560 012, India.
| | | |
Collapse
|
33
|
Kristensen DM, Wolf YI, Mushegian AR, Koonin EV. Computational methods for Gene Orthology inference. Brief Bioinform 2011; 12:379-91. [PMID: 21690100 DOI: 10.1093/bib/bbr030] [Citation(s) in RCA: 154] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Accurate inference of orthologous genes is a pre-requisite for most comparative genomics studies, and is also important for functional annotation of new genomes. Identification of orthologous gene sets typically involves phylogenetic tree analysis, heuristic algorithms based on sequence conservation, synteny analysis, or some combination of these approaches. The most direct tree-based methods typically rely on the comparison of an individual gene tree with a species tree. Once the two trees are accurately constructed, orthologs are straightforwardly identified by the definition of orthology as those homologs that are related by speciation, rather than gene duplication, at their most recent point of origin. Although ideal for the purpose of orthology identification in principle, phylogenetic trees are computationally expensive to construct for large numbers of genes and genomes, and they often contain errors, especially at large evolutionary distances. Moreover, in many organisms, in particular prokaryotes and viruses, evolution does not appear to have followed a simple 'tree-like' mode, which makes conventional tree reconciliation inapplicable. Other, heuristic methods identify probable orthologs as the closest homologous pairs or groups of genes in a set of organisms. These approaches are faster and easier to automate than tree-based methods, with efficient implementations provided by graph-theoretical algorithms enabling comparisons of thousands of genomes. Comparisons of these two approaches show that, despite conceptual differences, they produce similar sets of orthologs, especially at short evolutionary distances. Synteny also can aid in identification of orthologs. Often, tree-based, sequence similarity- and synteny-based approaches can be combined into flexible hybrid methods.
Collapse
Affiliation(s)
- David M Kristensen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | | | | | | |
Collapse
|
34
|
Gerlach W, Stoye J. Taxonomic classification of metagenomic shotgun sequences with CARMA3. Nucleic Acids Res 2011; 39:e91. [PMID: 21586583 PMCID: PMC3152360 DOI: 10.1093/nar/gkr225] [Citation(s) in RCA: 94] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The vast majority of microbes are unculturable and thus cannot be sequenced by means of traditional methods. High-throughput sequencing techniques like 454 or Solexa-Illumina make it possible to explore those microbes by studying whole natural microbial communities and analysing their biological diversity as well as the underlying metabolic pathways. Over the past few years, different methods have been developed for the taxonomic and functional characterization of metagenomic shotgun sequences. However, the taxonomic classification of metagenomic sequences from novel species without close homologue in the biological sequence databases poses a challenge due to the high number of wrong taxonomic predictions on lower taxonomic ranks. Here we present CARMA3, a new method for the taxonomic classification of assembled and unassembled metagenomic sequences that has been adapted to work with both BLAST and HMMER3 homology searches. We show that our method makes fewer wrong taxonomic predictions (at the same sensitivity) than other BLAST-based methods. CARMA3 is freely accessible via the web application WebCARMA from http://webcarma.cebitec.uni-bielefeld.de.
Collapse
Affiliation(s)
- Wolfgang Gerlach
- Genome Informatics Group, Faculty of Technology and Institute for Bioinformatics, Center for Biotechnology, Bielefeld University, Bielefeld, Germany
| | | |
Collapse
|
35
|
Lazarevic V, Beaume M, Corvaglia A, Hernandez D, Schrenzel J, François P. Epidemiology and virulence insights from MRSA and MSSA genome analysis. Future Microbiol 2011; 6:513-32. [DOI: 10.2217/fmb.11.38] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Staphylococcus aureus is a major human pathogen responsible for a wide diversity of infections ranging from localized to life threatening diseases. From 1961 and the emergence of methicillin-resistant S. aureus (MRSA), this bacterium has shown a particular capacity to survive and adapt to drastic environmental changes and since the beginning of the 1990s it has spread worldwide. Until recently, S. aureus was considered as the prototype of a nosocomial pathogen but it has now been recognized as an agent responsible for outbreaks in the community. Several recent reports suggest that the epidemiology of MRSA is changing. Understanding of pathogenicity, virulence and emergence of epidemic clones within MRSA populations is not clearly defined, despite several attempts to identify common molecular features between strains that share similar epidemiological and/or virulence behavior. These studies included: pattern profiling of bacterial adhesins, analysis of clonal complex groups, molecular genotyping and enterotoxin content analysis. To date, all approaches failed to find a correlation between molecular determinants and clinical outcomes. We hypothesize that the capacity of the bacterium to become more invasive or virulent is determined by genetics. The utilization of massively parallel methods of analysis is therefore ideal to study the contribution of genetics. Therefore, this article focuses on the entire genome including coding sequences as well as noncoding sequences. This high resolution approach allows the monitoring micro- and macroevolution of MRSA and identification of specific genomic markers of evolution of invasive or highly virulent phenotypes.
Collapse
Affiliation(s)
- Vladimir Lazarevic
- Genomic Research Laboratory, Geneva University Hospitals, CH-1211 Geneva 14, Switzerland
| | - Marie Beaume
- Genomic Research Laboratory, Geneva University Hospitals, CH-1211 Geneva 14, Switzerland
| | - Anna Corvaglia
- Department of Microbiology & Molecular Medicine, University Medical Centre, University of Geneva, 1211 Geneva 4, Switzerland
| | - David Hernandez
- Genomic Research Laboratory, Geneva University Hospitals, CH-1211 Geneva 14, Switzerland
| | - Jacques Schrenzel
- Genomic Research Laboratory, Geneva University Hospitals, CH-1211 Geneva 14, Switzerland
| | | |
Collapse
|
36
|
Luo H, Tang J, Friedman R, Hughes AL. Ongoing purifying selection on intergenic spacers in group A streptococcus. INFECTION, GENETICS AND EVOLUTION : JOURNAL OF MOLECULAR EPIDEMIOLOGY AND EVOLUTIONARY GENETICS IN INFECTIOUS DISEASES 2011; 11:343-8. [PMID: 21115137 PMCID: PMC3411356 DOI: 10.1016/j.meegid.2010.11.005] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/29/2010] [Revised: 11/05/2010] [Accepted: 11/08/2010] [Indexed: 11/15/2022]
Abstract
Bacterial intergenic spacers are non-coding genomic regions enriched with cis-regulatory elements for gene expression. A population genetics approach was used to investigate the evolutionary force shaping the genetic diversity of intergenic spacers among 13 genomes of group A streptococcus (GAS). Analysis of 590 genes and their linked 5' intergenic spacers showed reduced nucleotide diversity in spacers compared to synonymous nucleotide diversity in protein-coding regions, suggestive of past purifying selection on spacers. Certain spacers showed elevated nucleotide diversity indicative of past homologous recombination with divergent genotypes. In addition, analysis of the difference between mean nucleotide difference and number of segregating sites showed evidence of an excess of rare variants both at nonsynonymous sites in genes and at sites in spacers, which is evidence that there are numerous slightly deleterious variants in GAS populations with potential effects on both protein sequences and gene expression.
Collapse
Affiliation(s)
- Haiwei Luo
- Department of Biological Sciences, University of South Carolina, Columbia 29208, USA
| | - Jijun Tang
- Department of Computer Science and Engineering, University of South Carolina, Columbia 29208, USA
| | - Robert Friedman
- Department of Biological Sciences, University of South Carolina, Columbia 29208, USA
| | - Austin L. Hughes
- Department of Biological Sciences, University of South Carolina, Columbia 29208, USA
| |
Collapse
|
37
|
Locey KJ, White EP. Simple structural differences between coding and noncoding DNA. PLoS One 2011; 6:e14651. [PMID: 21304908 PMCID: PMC3033402 DOI: 10.1371/journal.pone.0014651] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2010] [Accepted: 01/06/2011] [Indexed: 11/17/2022] Open
Abstract
Background The study of large-scale genome structure has revealed patterns suggesting the influence of evolutionary constraints on genome evolution. However, the results of these studies can be difficult to interpret due to the conceptual complexity of the analyses. This makes it difficult to understand how observed statistical patterns relate to the physical distribution of genomic elements. We use a simpler and more intuitive approach to evaluate patterns of genome structure. Methodology/Principal Findings We used randomization tests based on Morisita's Index of aggregation to examine average differences in the distribution of purines and pyrimidines among coding and noncoding regions of 261 chromosomes from 223 microbial genomes representing 21 phylum level groups. Purines and pyrimidines were aggregated in the noncoding DNA of 86% of genomes, but were only aggregated in the coding regions of 52% of genomes. Coding and noncoding DNA differed in aggregation in 94% of genomes. Noncoding regions were more aggregated than coding regions in 91% of these genomes. Genome length appears to limit aggregation, but chromosome length does not. Chromosomes from the same species are similarly aggregated despite substantial differences in length. Aggregation differed among taxonomic groups, revealing support for a previously reported pattern relating genome structure to environmental conditions. Conclusions/Significance Our approach revealed several patterns of genome structure among different types of DNA, different chromosomes of the same genome, and among different taxonomic groups. Similarity in aggregation among chromosomes of varying length from the same genome suggests that individual chromosome structure has not evolved independently of the general constraints on genome structure as a whole. These patterns were detected using simple and readily interpretable methods commonly used in other areas of biology.
Collapse
Affiliation(s)
- Kenneth J Locey
- Department of Biology, Utah State University, Logan, Utah, USA.
| | | |
Collapse
|
38
|
Brinza L, Calevro F, Duport G, Gaget K, Gautier C, Charles H. Structure and dynamics of the operon map of Buchnera aphidicola sp. strain APS. BMC Genomics 2010; 11:666. [PMID: 21108805 PMCID: PMC3091783 DOI: 10.1186/1471-2164-11-666] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2010] [Accepted: 11/25/2010] [Indexed: 01/28/2023] Open
Abstract
BACKGROUND Gene expression regulation is still poorly documented in bacteria with highly reduced genomes. Understanding the evolution and mechanisms underlying the regulation of gene transcription in Buchnera aphidicola, the primary endosymbiont of aphids, is expected both to enhance our understanding of this nutritionally based association and to provide an intriguing case-study of the evolution of gene expression regulation in a reduced bacterial genome. RESULTS A Bayesian predictor was defined to infer the B. aphidicola transcription units, which were further validated using transcriptomic data and RT-PCR experiments. The characteristics of B. aphidicola predicted transcription units (TUs) were analyzed in order to evaluate the impact of operon map organization on the regulation of gene transcription.On average, B. aphidicola TUs contain more genes than those of E. coli. The global layout of B. aphidicola operon map was mainly shaped by the big reduction and the rearrangements events, which occurred at the early stage of the symbiosis. Our analysis suggests that this operon map may evolve further only by small reorganizations around the frontiers of B. aphidicola TUs, through promoter and/or terminator sequence modifications and/or by pseudogenization events. We also found that the need for specific transcription regulation exerts some pressure on gene conservation, but not on gene assembling in the operon map in Buchnera. Our analysis of the TUs spacing pointed out that a selection pressure is maintained on the length of the intergenic regions between divergent adjacent gene pairs. CONCLUSIONS B. aphidicola can seemingly only evolve towards a more polycistronic operon map. This implies that gene transcription regulation is probably subject to weak selection pressure in Buchnera conserving operons composed of genes with unrelated functions.
Collapse
Affiliation(s)
- Lilia Brinza
- INSA-Lyon, UMR203 BF2I, INRA, Biologie Fonctionnelle Insectes et Interactions, Bât. Louis Pasteur 20 ave. Albert Einstein, F-69621 Villeurbanne, France
| | - Federica Calevro
- INSA-Lyon, UMR203 BF2I, INRA, Biologie Fonctionnelle Insectes et Interactions, Bât. Louis Pasteur 20 ave. Albert Einstein, F-69621 Villeurbanne, France
- Université de Lyon, INRIA Bamboo, F-69621 France
| | - Gabrielle Duport
- INSA-Lyon, UMR203 BF2I, INRA, Biologie Fonctionnelle Insectes et Interactions, Bât. Louis Pasteur 20 ave. Albert Einstein, F-69621 Villeurbanne, France
| | - Karen Gaget
- INSA-Lyon, UMR203 BF2I, INRA, Biologie Fonctionnelle Insectes et Interactions, Bât. Louis Pasteur 20 ave. Albert Einstein, F-69621 Villeurbanne, France
| | - Christian Gautier
- Université de Lyon, Univ Lyon 1, CNRS UMR5557 Ecologie Microbienne, INRA, F-69622 Villeurbanne, France
- Université de Lyon, INRIA Bamboo, F-69621 France
| | - Hubert Charles
- INSA-Lyon, UMR203 BF2I, INRA, Biologie Fonctionnelle Insectes et Interactions, Bât. Louis Pasteur 20 ave. Albert Einstein, F-69621 Villeurbanne, France
- Université de Lyon, INRIA Bamboo, F-69621 France
| |
Collapse
|
39
|
Weng FC, Su CH, Hsu MT, Wang TY, Tsai HK, Wang D. Reanalyze unassigned reads in Sanger based metagenomic data using conserved gene adjacency. BMC Bioinformatics 2010; 11:565. [PMID: 21083935 PMCID: PMC3098102 DOI: 10.1186/1471-2105-11-565] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2009] [Accepted: 11/18/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Investigation of metagenomes provides greater insight into uncultured microbial communities. The improvement in sequencing technology, which yields a large amount of sequence data, has led to major breakthroughs in the field. However, at present, taxonomic binning tools for metagenomes discard 30-40% of Sanger sequencing data due to the stringency of BLAST cut-offs. In an attempt to provide a comprehensive overview of metagenomic data, we re-analyzed the discarded metagenomes by using less stringent cut-offs. Additionally, we introduced a new criterion, namely, the evolutionary conservation of adjacency between neighboring genes. To evaluate the feasibility of our approach, we re-analyzed discarded contigs and singletons from several environments with different levels of complexity. We also compared the consistency between our taxonomic binning and those reported in the original studies. RESULTS Among the discarded data, we found that 23.7 ± 3.9% of singletons and 14.1 ± 1.0% of contigs were assigned to taxa. The recovery rates for singletons were higher than those for contigs. The Pearson correlation coefficient revealed a high degree of similarity (0.94 ± 0.03 at the phylum rank and 0.80 ± 0.11 at the family rank) between the proposed taxonomic binning approach and those reported in original studies. In addition, an evaluation using simulated data demonstrated the reliability of the proposed approach. CONCLUSIONS Our findings suggest that taking account of conserved neighboring gene adjacency improves taxonomic assignment when analyzing metagenomes using Sanger sequencing. In other words, utilizing the conserved gene order as a criterion will reduce the amount of data discarded when analyzing metagenomes.
Collapse
Affiliation(s)
- Francis C Weng
- Biodiversity Research Center, Academia Sinica, Taipei, Taiwan
| | | | | | | | | | | |
Collapse
|
40
|
Dutta A, Paul S, Dutta C. GC-rich intra-operonic spacers in prokaryotes: Possible relation to gene order conservation. FEBS Lett 2010; 584:4633-8. [DOI: 10.1016/j.febslet.2010.10.037] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2010] [Revised: 10/12/2010] [Accepted: 10/15/2010] [Indexed: 11/28/2022]
|
41
|
Rangannan V, Bansal M. High-quality annotation of promoter regions for 913 bacterial genomes. ACTA ACUST UNITED AC 2010; 26:3043-50. [PMID: 20956245 DOI: 10.1093/bioinformatics/btq577] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
MOTIVATION The number of bacterial genomes being sequenced is increasing very rapidly and hence, it is crucial to have procedures for rapid and reliable annotation of their functional elements such as promoter regions, which control the expression of each gene or each transcription unit of the genome. The present work addresses this requirement and presents a generic method applicable across organisms. RESULTS Relative stability of the DNA double helical sequences has been used to discriminate promoter regions from non-promoter regions. Based on the difference in stability between neighboring regions, an algorithm has been implemented to predict promoter regions on a large scale over 913 microbial genome sequences. The average free energy values for the promoter regions as well as their downstream regions are found to differ, depending on their GC content. Threshold values to identify promoter regions have been derived using sequences flanking a subset of translation start sites from all microbial genomes and then used to predict promoters over the complete genome sequences. An average recall value of 72% (which indicates the percentage of protein and RNA coding genes with predicted promoter regions assigned to them) and precision of 56% is achieved over the 913 microbial genome dataset. AVAILABILITY The binary executable for 'PromPredict' algorithm (implemented in PERL and supported on Linux and MS Windows) and the predicted promoter data for all 913 microbial genomes are available at http://nucleix.mbu.iisc.ernet.in/prombase/.
Collapse
|
42
|
Abstract
The advances of next-generation sequencing technology have facilitated metagenomics research that attempts to determine directly the whole collection of genetic material within an environmental sample (i.e. the metagenome). Identification of genes directly from short reads has become an important yet challenging problem in annotating metagenomes, since the assembly of metagenomes is often not available. Gene predictors developed for whole genomes (e.g. Glimmer) and recently developed for metagenomic sequences (e.g. MetaGene) show a significant decrease in performance as the sequencing error rates increase, or as reads get shorter. We have developed a novel gene prediction method FragGeneScan, which combines sequencing error models and codon usages in a hidden Markov model to improve the prediction of protein-coding region in short reads. The performance of FragGeneScan was comparable to Glimmer and MetaGene for complete genomes. But for short reads, FragGeneScan consistently outperformed MetaGene (accuracy improved ∼62% for reads of 400 bases with 1% sequencing errors, and ∼18% for short reads of 100 bases that are error free). When applied to metagenomes, FragGeneScan recovered substantially more genes than MetaGene predicted (>90% of the genes identified by homology search), and many novel genes with no homologs in current protein sequence database.
Collapse
Affiliation(s)
- Mina Rho
- School of Informatics and Computing, Indiana University, Bloomington, IN 47408, USA
| | | | | |
Collapse
|
43
|
Merhej V, Raoult D. Rickettsial evolution in the light of comparative genomics. Biol Rev Camb Philos Soc 2010; 86:379-405. [DOI: 10.1111/j.1469-185x.2010.00151.x] [Citation(s) in RCA: 183] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
44
|
Abstract
Using an oligonucleotide microarray, we searched for previously unrecognized transcription units in intergenic regions in the genome of Bacillus subtilis, with an emphasis on identifying small genes activated during spore formation. Nineteen transcription units were identified, 11 of which were shown to depend on one or more sporulation-regulatory proteins for their expression. A high proportion of the transcription units contained small, functional open reading frames (ORFs). One such newly identified ORF is a member of a family of six structurally similar genes that are transcribed under the control of sporulation transcription factor σ(E) or σ(K). A multiple mutant lacking all six genes was found to sporulate with slightly higher efficiency than the wild type, suggesting that under standard laboratory conditions the expression of these genes imposes a small cost on the production of heat-resistant spores. Finally, three of the transcription units specified small, noncoding RNAs; one of these was under the control of the sporulation transcription factor σ(E), and another was under the control of the motility sigma factor σ(D).
Collapse
|
45
|
Pallejà A, García-Vallvé S, Romeu A. Adaptation of the short intergenic spacers between co-directional genes to the Shine-Dalgarno motif among prokaryote genomes. BMC Genomics 2009; 10:537. [PMID: 19922619 PMCID: PMC2784483 DOI: 10.1186/1471-2164-10-537] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2009] [Accepted: 11/18/2009] [Indexed: 11/30/2022] Open
Abstract
Background In prokaryote genomes most of the co-directional genes are in close proximity. Even the coding sequence or the stop codon of a gene can overlap with the Shine-Dalgarno (SD) sequence of the downstream co-directional gene. In this paper we analyze how the presence of SD may influence the stop codon usage or the spacing lengths between co-directional genes. Results The SD sequences for 530 prokaryote genomes have been predicted using computer calculations of the base-pairing free energy between translation initiation regions and the 16S rRNA 3' tail. Genomes with a large number of genes with the SD sequence concentrate this regulatory motif from 4 to 11 bps before the start codon. However, not all genes seem to have the SD sequence. Genes separated from 1 to 4 bps from a co-directional upstream gene show a high SD presence, though this regulatory signal is located towards the 3' end of the coding sequence of the upstream gene. Genes separated from 9 to 15 bps show the highest SD presence as they accommodate the SD sequence within an intergenic region. However, genes separated from around 5 to 8 bps have a lower percentage of SD presence and when the SD is present, the stop codon usage of the upstream gene changes to accommodate the overlap between the SD sequence and the stop codon. Conclusion The SD presence makes the intergenic lengths from 5 to 8 bps less frequent and causes an adaptation of the stop codon usage. Our results introduce new elements to the discussion of which factors affect the intergenic lengths, which cannot be totally explained by the pressure to compact the prokaryote genomes.
Collapse
Affiliation(s)
- Albert Pallejà
- Department of Biochemistry and Biotechnology, Rovira i Virgili University, Tarragona, Catalonia, Spain.
| | | | | |
Collapse
|
46
|
Pallejà A, Reverter T, Garcia-Vallvé S, Romeu A. PairWise Neighbours database: overlaps and spacers among prokaryote genomes. BMC Genomics 2009; 10:281. [PMID: 19555467 PMCID: PMC2716372 DOI: 10.1186/1471-2164-10-281] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2009] [Accepted: 06/25/2009] [Indexed: 05/25/2023] Open
Abstract
Background Although prokaryotes live in a variety of habitats and possess different metabolic and genomic complexity, they have several genomic architectural features in common. The overlapping genes are a common feature of the prokaryote genomes. The overlapping lengths tend to be short because as the overlaps become longer they have more risk of deleterious mutations. The spacers between genes tend to be short too because of the tendency to reduce the non coding DNA among prokaryotes. However they must be long enough to maintain essential regulatory signals such as the Shine-Dalgarno (SD) sequence, which is responsible of an efficient translation. Description PairWise Neighbours is an interactive and intuitive database used for retrieving information about the spacers and overlapping genes among bacterial and archaeal genomes. It contains 1,956,294 gene pairs from 678 fully sequenced prokaryote genomes and is freely available at the URL . This database provides information about the overlaps and their conservation across species. Furthermore, it allows the wide analysis of the intergenic regions providing useful information such as the location and strength of the SD sequence. Conclusion There are experiments and bioinformatic analysis that rely on correct annotations of the initiation site. Therefore, a database that studies the overlaps and spacers among prokaryotes appears to be desirable. PairWise Neighbours database permits the reliability analysis of the overlapping structures and the study of the SD presence and location among the adjacent genes, which may help to check the annotation of the initiation sites.
Collapse
Affiliation(s)
- Albert Pallejà
- Department of Biochemistry and Biotechnology, Rovira i Virgili University, Tarragona, Catalunya, Spain.
| | | | | | | |
Collapse
|
47
|
Koonin EV. Evolution of genome architecture. Int J Biochem Cell Biol 2009; 41:298-306. [PMID: 18929678 PMCID: PMC3272702 DOI: 10.1016/j.biocel.2008.09.015] [Citation(s) in RCA: 146] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2008] [Revised: 09/16/2008] [Accepted: 09/16/2008] [Indexed: 11/26/2022]
Abstract
Charles Darwin believed that all traits of organisms have been honed to near perfection by natural selection. The empirical basis underlying Darwin's conclusions consisted of numerous observations made by him and other naturalists on the exquisite adaptations of animals and plants to their natural habitats and on the impressive results of artificial selection. Darwin fully appreciated the importance of heredity but was unaware of the nature and, in fact, the very existence of genomes. A century and a half after the publication of the "Origin", we have the opportunity to draw conclusions from the comparisons of hundreds of genome sequences from all walks of life. These comparisons suggest that the dominant mode of genome evolution is quite different from that of the phenotypic evolution. The genomes of vertebrates, those purported paragons of biological perfection, turned out to be veritable junkyards of selfish genetic elements where only a small fraction of the genetic material is dedicated to encoding biologically relevant information. In sharp contrast, genomes of microbes and viruses are incomparably more compact, with most of the genetic material assigned to distinct biological functions. However, even in these genomes, the specific genome organization (gene order) is poorly conserved. The results of comparative genomics lead to the conclusion that the genome architecture is not a straightforward result of continuous adaptation but rather is determined by the balance between the selection pressure, that is itself dependent on the effective population size and mutation rate, the level of recombination, and the activity of selfish elements. Although genes and, in many cases, multigene regions of genomes possess elaborate architectures that ensure regulation of expression, these arrangements are evolutionarily volatile and typically change substantially even on short evolutionary scales when gene sequences diverge minimally. Thus, the observed genome architectures are, mostly, products of neutral processes or epiphenomena of more general selective processes, such as selection for genome streamlining in successful lineages with large populations. Selection for specific gene arrangements (elements of genome architecture) seems only to modulate the results of these processes.
Collapse
Affiliation(s)
- Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD 20894, USA.
| |
Collapse
|
48
|
Guo X, Silva JC. Properties of non-coding DNA and identification of putative cis-regulatory elements in Theileria parva. BMC Genomics 2008; 9:582. [PMID: 19055776 PMCID: PMC2612703 DOI: 10.1186/1471-2164-9-582] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2008] [Accepted: 12/03/2008] [Indexed: 01/24/2023] Open
Abstract
Background Parasites in the genus Theileria cause lymphoproliferative diseases in cattle, resulting in enormous socio-economic losses. The availability of the genome sequences and annotation for T. parva and T. annulata has facilitated the study of parasite biology and their relationship with host cell transformation and tropism. However, the mechanism of transcriptional regulation in this genus, which may be key to understanding fundamental aspects of its parasitology, remains poorly understood. In this study, we analyze the evolution of non-coding sequences in the Theileria genome and identify conserved sequence elements that may be involved in gene regulation of these parasitic species. Results Intergenic regions and introns in Theileria are short, and their length distributions are considerably right-skewed. Intergenic regions flanked by genes in 5'-5' orientation tend to be longer and slightly more AT-rich than those flanked by two stop codons; intergenic regions flanked by genes in 3'-5' orientation have intermediate values of length and AT composition. Intron position is negatively correlated with intron length, and positively correlated with GC content. Using stringent criteria, we identified a set of high-quality orthologous non-coding sequences between T. parva and T. annulata, and determined the distribution of selective constraints across regions, which are shown to be higher close to translation start sites. A positive correlation between constraint and length in both intergenic regions and introns suggests a tight control over length expansion of non-coding regions. Genome-wide searches for functional elements revealed several conserved motifs in intergenic regions of Theileria genomes. Two such motifs are preferentially located within the first 60 base pairs upstream of transcription start sites in T. parva, are preferentially associated with specific protein functional categories, and have significant similarity to know regulatory motifs in other species. These results suggest that these two motifs are likely to represent transcription factor binding sites in Theileria. Conclusion Theileria genomes are highly compact, with selection seemingly favoring short introns and intergenic regions. Three over-represented sequence motifs were independently identified in intergenic regions of both Theileria species, and the evidence suggests that at least two of them play a role in transcriptional control in T. parva. These are prime candidates for experimental validation of transcription factor binding sites in this single-celled eukaryotic parasite. Sequences similar to two of these Theileria motifs are conserved in Plasmodium hinting at the possibility of common regulatory machinery across the phylum Apicomplexa.
Collapse
Affiliation(s)
- Xiang Guo
- The Institute for Genomic Research/J. Craig Venter Institute, 9712 Medical Center Drive, Rockville, MD 20850, USA.
| | | |
Collapse
|
49
|
|
50
|
Pallejà A, Harrington ED, Bork P. Large gene overlaps in prokaryotic genomes: result of functional constraints or mispredictions? BMC Genomics 2008; 9:335. [PMID: 18627618 PMCID: PMC2478687 DOI: 10.1186/1471-2164-9-335] [Citation(s) in RCA: 50] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2008] [Accepted: 07/15/2008] [Indexed: 11/20/2022] Open
Abstract
Background Across the fully sequenced microbial genomes there are thousands of examples of overlapping genes. Many of these are only a few nucleotides long and are thought to function by permitting the coordinated regulation of gene expression. However, there should also be selective pressure against long overlaps, as the existence of overlapping reading frames increases the risk of deleterious mutations. Here we examine the longest overlaps and assess whether they are the product of special functional constraints or of erroneous annotation. Results We analysed the genes that overlap by 60 bps or more among 338 fully-sequenced prokaryotic genomes. The likely functional significance of an overlap was determined by comparing each of the genes to its respective orthologs. If a gene showed a significantly different length from its orthologs it was considered unlikely to be functional and therefore the result of an error either in sequencing or gene prediction. Focusing on 715 co-directional overlaps longer than 60 bps, we classified the erroneous ones into five categories: i) 5'-end extension of the downstream gene due to either a mispredicted start codon or a frameshift at 5'-end of the gene (409 overlaps), ii) fragmentation of a gene caused by a frameshift (163), iii) 3'-end extension of the upstream gene due to either a frameshift at 3'-end of a gene or point mutation at the stop codon (68), iv) Redundant gene predictions (4), v) 5' & 3'-end extension which is a combination of i) and iii) (71). We also studied 75 divergent overlaps that could be classified as misannotations of group i). Nevertheless we found some convergent long overlaps (54) that might be true overlaps, although an important part of convergent overlaps could be classified as group iii) (124). Conclusion Among the 968 overlaps larger than 60 bps which we analysed, we did not find a single real one among the co-directional and divergent orientations and concluded that there had been an excessive number of misannotations. Only convergent orientation seems to permit some long overlaps, although convergent overlaps are also hampered by misannotations. We propose a simple rule to flag these erroneous gene length predictions to facilitate automatic annotation.
Collapse
Affiliation(s)
- Albert Pallejà
- Biochemistry and Biotechnology Department, Rovira i Virgili University, C/Marcel.lí Domingo s/n, 43007 Tarragona, Catalunya, Spain.
| | | | | |
Collapse
|