Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Karlin S, Campbell AM, Mrázek J. Comparative DNA analysis across diverse genomes. Annu Rev Genet 1999;32:185-225. [PMID: 9928479 DOI: 10.1146/annurev.genet.32.1.185] [Citation(s) in RCA: 238] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

For:	Karlin S, Campbell AM, Mrázek J. Comparative DNA analysis across diverse genomes. Annu Rev Genet 1999;32:185-225. [PMID: 9928479 DOI: 10.1146/annurev.genet.32.1.185] [Citation(s) in RCA: 238] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Number

Cited by Other Article(s)

Niu XN, Wei ZQ, Zou HF, Xie GG, Wu F, Li KJ, Jiang W, Tang JL, He YQ. Complete sequence and detailed analysis of the first indigenous plasmid from Xanthomonas oryzae pv. oryzicola. BMC Microbiol 2015;15:233. [PMID: 26498126 PMCID: PMC4619425 DOI: 10.1186/s12866-015-0562-x] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2015] [Accepted: 10/08/2015] [Indexed: 01/24/2023] Open

Abstract

BACKGROUND

Bacterial plasmids have a major impact on metabolic function and adaptation of their hosts. An indigenous plasmid was identified in a Chinese isolate (GX01) of the invasive phytopathogen Xanthomonas oryzae pv. oryzicola (Xoc), the causal agent of rice bacterial leaf streak (BLS). To elucidate the biological functions of the plasmid, we have sequenced and comprehensively annotated the plasmid.

METHODS

The plasmid DNA was extracted from Xoc strain GX01 by alkaline lysis and digested with restriction enzymes. The cloned and subcloned DNA fragments in pUC19 were sequenced by Sanger sequencing. Sequences were assembled by using Sequencher software. Gaps were closed by primer walking and sequencing, and multi-PCRs were conducted through the whole plasmid sequence for verification. BLAST, phylogenetic analysis and dinucleotide calculation were performed for gene annotation and DNA structure analysis. Transformation, transconjugation and stress tolerance tests were carried out for plasmid function assays.

RESULTS

The indigenous plasmid from Xoc strain GX01, designated pXOCgx01, is 53,206-bp long and has been annotated to possess 64 open reading frames (ORFs), including genes encoding type IV secretion system, heavy metal exporter, plasmid stability factors, and DNA mobile factors, i.e., the Tn3-like transposon. Bioinformatics analysis showed that pXOCgx01 has a mosaic structure containing different genome contexts with distinct genomic heterogeneities. Phylogenetic analysis indicated that the closest relative of pXOCgx01 is pXAC64 from Xanthomonas axonopodis pv. citri str. 306. It was estimated that there are four copies of pXOCgx01 per cell of Xoc GX01 by PCR assay and the calculation of whole genome shotgun sequencing data. We demonstrate that pXOCgx01 is a self-transmissible plasmid and can replicate in some Xanthomonas spp. strains, but not in Escherichia coli DH5α. It could significantly enhance the tolerance of Xanthomonas oryzae pv. oryzae PXO99A to the stresses of heavy metal ions. The plasmid survey indicated that nine out of 257 Xoc Chinese isolates contain plasmids.

CONCLUSIONS

pXOCgx01 is the first report of indigenous plasmid from Xanthomonas oryzae pv. oryzicola, and the first completely sequenced plasmid from Xanthomonas oryzae species. It is a self-transmissible plasmid and has a mosaic structure, containing genes for macromolecule secretion, heavy metal exportation, and DNA mobile factors, especially the Tn3-like transposon which may provide transposition function for mobile insertion cassette and play a major role in the spread of pathogenicity determinants. The results will be helpful to elucidate the biological significance of this cryptic plasmid and the adaptive evolution of Xoc.

Collapse

Affiliation(s)

Xiang-Na Niu State Key Laboratory for Conservation and Utilization of Subtropical Agro-bioresources, The Key Laboratory of Ministry of Education for Microbial and Plant Genetic Engineering, and College of Life Science and Technology, Guangxi University, 100 Daxue Road, Nanning, 530004, China.
Zhi-Qiong Wei State Key Laboratory for Conservation and Utilization of Subtropical Agro-bioresources, The Key Laboratory of Ministry of Education for Microbial and Plant Genetic Engineering, and College of Life Science and Technology, Guangxi University, 100 Daxue Road, Nanning, 530004, China.
Hai-Fan Zou State Key Laboratory for Conservation and Utilization of Subtropical Agro-bioresources, The Key Laboratory of Ministry of Education for Microbial and Plant Genetic Engineering, and College of Life Science and Technology, Guangxi University, 100 Daxue Road, Nanning, 530004, China.
Gui-Gang Xie State Key Laboratory for Conservation and Utilization of Subtropical Agro-bioresources, The Key Laboratory of Ministry of Education for Microbial and Plant Genetic Engineering, and College of Life Science and Technology, Guangxi University, 100 Daxue Road, Nanning, 530004, China.
Feng Wu State Key Laboratory for Conservation and Utilization of Subtropical Agro-bioresources, The Key Laboratory of Ministry of Education for Microbial and Plant Genetic Engineering, and College of Life Science and Technology, Guangxi University, 100 Daxue Road, Nanning, 530004, China.
Kang-Jia Li State Key Laboratory for Conservation and Utilization of Subtropical Agro-bioresources, The Key Laboratory of Ministry of Education for Microbial and Plant Genetic Engineering, and College of Life Science and Technology, Guangxi University, 100 Daxue Road, Nanning, 530004, China.
Wei Jiang State Key Laboratory for Conservation and Utilization of Subtropical Agro-bioresources, The Key Laboratory of Ministry of Education for Microbial and Plant Genetic Engineering, and College of Life Science and Technology, Guangxi University, 100 Daxue Road, Nanning, 530004, China.
Ji-Liang Tang State Key Laboratory for Conservation and Utilization of Subtropical Agro-bioresources, The Key Laboratory of Ministry of Education for Microbial and Plant Genetic Engineering, and College of Life Science and Technology, Guangxi University, 100 Daxue Road, Nanning, 530004, China.
Yong-Qiang He State Key Laboratory for Conservation and Utilization of Subtropical Agro-bioresources, The Key Laboratory of Ministry of Education for Microbial and Plant Genetic Engineering, and College of Life Science and Technology, Guangxi University, 100 Daxue Road, Nanning, 530004, China.

Collapse

Spring-Pearson SM, Stone JK, Doyle A, Allender CJ, Okinaka RT, Mayo M, Broomall SM, Hill JM, Karavis MA, Hubbard KS, Insalaco JM, McNew LA, Rosenzweig CN, Gibbons HS, Currie BJ, Wagner DM, Keim P, Tuanyok A. Pangenome Analysis of Burkholderia pseudomallei: Genome Evolution Preserves Gene Order despite High Recombination Rates. PLoS One 2015;10:e0140274. [PMID: 26484663 PMCID: PMC4613141 DOI: 10.1371/journal.pone.0140274] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2015] [Accepted: 09/23/2015] [Indexed: 11/19/2022] Open

Affiliation(s)

Senanu M. Spring-Pearson Department of Biological Sciences, Northern Arizona University, Flagstaff, AZ 86011, United States of America
Joshua K. Stone Department of Biological Sciences, Northern Arizona University, Flagstaff, AZ 86011, United States of America
Adina Doyle Department of Biological Sciences, Northern Arizona University, Flagstaff, AZ 86011, United States of America
Christopher J. Allender Department of Biological Sciences, Northern Arizona University, Flagstaff, AZ 86011, United States of America
Richard T. Okinaka Department of Biological Sciences, Northern Arizona University, Flagstaff, AZ 86011, United States of America
Mark Mayo Menzies School of Health Research and Infectious Disease Department, Royal Darwin Hospital. Darwin, Northern Territory, Australia
Stacey M. Broomall BioSciences Division, Edgewood Chemical Biological Center, Aberdeen Proving Ground, MD, United States of America
Jessica M. Hill BioSciences Division, Edgewood Chemical Biological Center, Aberdeen Proving Ground, MD, United States of America
Mark A. Karavis BioSciences Division, Edgewood Chemical Biological Center, Aberdeen Proving Ground, MD, United States of America
Kyle S. Hubbard BioSciences Division, Edgewood Chemical Biological Center, Aberdeen Proving Ground, MD, United States of America
Joseph M. Insalaco BioSciences Division, Edgewood Chemical Biological Center, Aberdeen Proving Ground, MD, United States of America
Lauren A. McNew BioSciences Division, Edgewood Chemical Biological Center, Aberdeen Proving Ground, MD, United States of America
C. Nicole Rosenzweig BioSciences Division, Edgewood Chemical Biological Center, Aberdeen Proving Ground, MD, United States of America
Henry S. Gibbons BioSciences Division, Edgewood Chemical Biological Center, Aberdeen Proving Ground, MD, United States of America
Bart J. Currie Menzies School of Health Research and Infectious Disease Department, Royal Darwin Hospital. Darwin, Northern Territory, Australia
David M. Wagner Department of Biological Sciences, Northern Arizona University, Flagstaff, AZ 86011, United States of America
Paul Keim Department of Biological Sciences, Northern Arizona University, Flagstaff, AZ 86011, United States of America * E-mail:
Apichai Tuanyok Department of Biological Sciences, Northern Arizona University, Flagstaff, AZ 86011, United States of America Department of Infectious Diseases and Pathology, University of Florida, Gainesville, FL, United States of America

Collapse

Genome Diversity of Spore-Forming Firmicutes. Microbiol Spectr 2015;1. [PMID: 26184964 DOI: 10.1128/microbiolspectrum.tbs-0015-2012] [Citation(s) in RCA: 117] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open

Natural selection causes adaptive genetic resistance in wild emmer wheat against powdery mildew at "Evolution Canyon" microsite, Mt. Carmel, Israel. PLoS One 2015;10:e0122344. [PMID: 25856164 PMCID: PMC4391946 DOI: 10.1371/journal.pone.0122344] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2014] [Accepted: 02/13/2015] [Indexed: 12/05/2022] Open

Abstract

Background

“Evolution Canyon” (ECI) at Lower Nahal Oren, Mount Carmel, Israel, is an optimal natural microscale model for unraveling evolution in action highlighting the basic evolutionary processes of adaptation and speciation. A major model organism in ECI is wild emmer, Triticum dicoccoides, the progenitor of cultivated wheat, which displays dramatic interslope adaptive and speciational divergence on the tropical-xeric “African” slope (AS) and the temperate-mesic “European” slope (ES), separated on average by 250 m.

Methods

We examined 278 single sequence repeats (SSRs) and the phenotype diversity of the resistance to powdery mildew between the opposite slopes. Furthermore, 18 phenotypes on the AS and 20 phenotypes on the ES, were inoculated by both Bgt E09 and a mixture of powdery mildew races.

Results

In the experiment of genetic diversity, very little polymorphism was identified intra-slope in the accessions from both the AS or ES. By contrast, 148 pairs of SSR primers (53.23%) amplified polymorphic products between the phenotypes of AS and ES. There are some differences between the two wild emmer wheat genomes and the inter-slope SSR polymorphic products between genome A and B. Interestingly, all wild emmer types growing on the south-facing slope (SFS=AS) were susceptible to a composite of Blumeria graminis, while the ones growing on the north-facing slope (NFS=ES) were highly resistant to Blumeria graminis at both seedling and adult stages.

Conclusion/Significance

Remarkable inter-slope evolutionary divergent processes occur in wild emmer wheat, T. dicoccoides at EC I, despite the shot average distance of 250 meters. The AS, a dry and hot slope, did not develop resistance to powdery mildew, whereas the ES, a cool and humid slope, did develop resistance since the disease stress was strong there. This is a remarkable demonstration in host-pathogen interaction on how resistance develops when stress causes an adaptive result at a micro-scale distance.

Collapse

Labonté JM, Swan BK, Poulos B, Luo H, Koren S, Hallam SJ, Sullivan MB, Woyke T, Wommack KE, Stepanauskas R. Single-cell genomics-based analysis of virus-host interactions in marine surface bacterioplankton. ISME JOURNAL 2015;9:2386-99. [PMID: 25848873 PMCID: PMC4611503 DOI: 10.1038/ismej.2015.48] [Citation(s) in RCA: 152] [Impact Index Per Article: 15.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/10/2014] [Revised: 01/27/2015] [Accepted: 02/26/2015] [Indexed: 02/01/2023]

Abstract

Viral infections dynamically alter the composition and metabolic potential of marine microbial communities and the evolutionary trajectories of host populations with resulting feedback on biogeochemical cycles. It is quite possible that all microbial populations in the ocean are impacted by viral infections. Our knowledge of virus–host relationships, however, has been limited to a minute fraction of cultivated host groups. Here, we utilized single-cell sequencing to obtain genomic blueprints of viruses inside or attached to individual bacterial and archaeal cells captured in their native environment, circumventing the need for host and virus cultivation. A combination of comparative genomics, metagenomic fragment recruitment, sequence anomalies and irregularities in sequence coverage depth and genome recovery were utilized to detect viruses and to decipher modes of virus–host interactions. Members of all three tailed phage families were identified in 20 out of 58 phylogenetically and geographically diverse single amplified genomes (SAGs) of marine bacteria and archaea. At least four phage–host interactions had the characteristics of late lytic infections, all of which were found in metabolically active cells. One virus had genetic potential for lysogeny. Our findings include first known viruses of Thaumarchaeota, Marinimicrobia, Verrucomicrobia and Gammaproteobacteria clusters SAR86 and SAR92. Viruses were also found in SAGs of Alphaproteobacteria and Bacteroidetes. A high fragment recruitment of viral metagenomic reads confirmed that most of the SAG-associated viruses are abundant in the ocean. Our study demonstrates that single-cell genomics, in conjunction with sequence-based computational tools, enable in situ, cultivation-independent insights into host–virus interactions in complex microbial communities.

Collapse

Analysis of dinucleotide signatures in HIV-1 subtype B genomes. J Genet 2014;92:403-12. [PMID: 24371162 DOI: 10.1007/s12041-013-0281-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]

Iwasaki Y, Abe T, Okada N, Wada K, Wada Y, Ikemura T. Evolutionary changes in vertebrate genome signatures with special focus on coelacanth. DNA Res 2014;21:459-67. [PMID: 24800745 PMCID: PMC4195492 DOI: 10.1093/dnares/dsu012] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open

Furuta Y, Namba-Fukuyo H, Shibata TF, Nishiyama T, Shigenobu S, Suzuki Y, Sugano S, Hasebe M, Kobayashi I. Methylome diversification through changes in DNA methyltransferase sequence specificity. PLoS Genet 2014;10:e1004272. [PMID: 24722038 PMCID: PMC3983042 DOI: 10.1371/journal.pgen.1004272] [Citation(s) in RCA: 68] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2013] [Accepted: 02/13/2014] [Indexed: 12/20/2022] Open

A novel bioinformatics method for efficient knowledge discovery by BLSOM from big genomic sequence data. BIOMED RESEARCH INTERNATIONAL 2014;2014:765648. [PMID: 24804244 PMCID: PMC3996302 DOI: 10.1155/2014/765648] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/24/2013] [Accepted: 02/14/2014] [Indexed: 11/17/2022]

Visualization of genome signatures of eukaryote genomes by batch-learning self-organizing map with a special emphasis on Drosophila genomes. BIOMED RESEARCH INTERNATIONAL 2014;2014:985706. [PMID: 24741568 PMCID: PMC3967822 DOI: 10.1155/2014/985706] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/01/2013] [Accepted: 02/04/2014] [Indexed: 11/24/2022]

Emerging evidence for functional peptides encoded by short open reading frames. Nat Rev Genet 2014;15:193-204. [PMID: 24514441 DOI: 10.1038/nrg3520] [Citation(s) in RCA: 402] [Impact Index Per Article: 36.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]

Srivastava SK, Huang X, Brar HK, Fakhoury AM, Bluhm BH, Bhattacharyya MK. The genome sequence of the fungal pathogen Fusarium virguliforme that causes sudden death syndrome in soybean. PLoS One 2014;9:e81832. [PMID: 24454689 PMCID: PMC3891557 DOI: 10.1371/journal.pone.0081832] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2012] [Accepted: 10/28/2013] [Indexed: 02/02/2023] Open

Abstract

UNLABELLED

Fusarium virguliforme causes sudden death syndrome (SDS) of soybean, a disease of serious concern throughout most of the soybean producing regions of the world. Despite the global importance, little is known about the pathogenesis mechanisms of F. virguliforme. Thus, we applied Next-Generation DNA Sequencing to reveal the draft F. virguliforme genome sequence and identified putative pathogenicity genes to facilitate discovering the mechanisms used by the pathogen to cause this disease.

METHODOLOGY/PRINCIPAL FINDINGS

We have generated the draft genome sequence of F. virguliforme by conducting whole-genome shotgun sequencing on a 454 GS-FLX Titanium sequencer. Initially, single-end reads of a 400-bp shotgun library were assembled using the PCAP program. Paired end sequences from 3 and 20 Kb DNA fragments and approximately 100 Kb inserts of 1,400 BAC clones were used to generate the assembled genome. The assembled genome sequence was 51 Mb. The N50 scaffold number was 11 with an N50 Scaffold length of 1,263 Kb. The AUGUSTUS gene prediction program predicted 14,845 putative genes, which were annotated with Pfam and GO databases. Gene distributions were uniform in all but one of the major scaffolds. Phylogenic analyses revealed that F. virguliforme was closely related to the pea pathogen, Nectria haematococca. Of the 14,845 F. virguliforme genes, 11,043 were conserved among five Fusarium species: F. virguliforme, F. graminearum, F. verticillioides, F. oxysporum and N. haematococca; and 1,332 F. virguliforme-specific genes, which may include pathogenicity genes. Additionally, searches for candidate F. virguliforme pathogenicity genes using gene sequences of the pathogen-host interaction database identified 358 genes.

CONCLUSIONS

The F. virguliforme genome sequence and putative pathogenicity genes presented here will facilitate identification of pathogenicity mechanisms involved in SDS development. Together, these resources will expedite our efforts towards discovering pathogenicity mechanisms in F. virguliforme. This will ultimately lead to improvement of SDS resistance in soybean.

Collapse

Satapathy SS, Powdel BR, Dutta M, Buragohain AK, Ray SK. Constraint on di-nucleotides by codon usage bias in bacterial genomes. Gene 2013;536:18-28. [PMID: 24333347 DOI: 10.1016/j.gene.2013.11.098] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2013] [Revised: 11/18/2013] [Accepted: 11/25/2013] [Indexed: 10/25/2022]

Abstract

It has been reported earlier that the relative di-nucleotide frequency (RDF) in different parts of a genome is similar while the frequency is variable among different genomes. So RDF is termed as genome signature in bacteria. It is not known if the constancy in RDF is governed by genome wide mutational bias or by selection. Here we did comparative analysis of RDF between the inter-genic and the coding sequences in seventeen bacterial genomes, whose gene expression data was available. The constraint on di-nucleotides was found to be higher in the coding sequences than that in the inter-genic regions and the constraint at the 2nd codon position was more than that in the 3rd position within a genome. Further analysis revealed that the constraint on di-nucleotides at the 2nd codon position is greater in the high expression genes (HEG) than that in the whole genomes as well as in the low expression genes (LEG). We analyzed RDF at the 2nd and the 3rd codon positions in simulated coding sequences that were computationally generated by keeping the codon usage bias (CUB) according to genome G+C composition and the sequence of amino acids unaltered. In the simulated coding sequences, the constraint observed was significantly low and no significant difference was observed between the HEG and the LEG in terms of di-nucleotide constraint. This indicated that the greater constraint on di-nucleotides in the HEG was due to the stronger selection on CUB in these genes in comparison to the LEG within a genome. Further, we did comparative analyses of the RDF in the HEG rpoB and rpoC of 199 bacteria, which revealed a common pattern of constraints on di-nucleotides at the 2nd codon position across these bacteria. To validate the role of CUB on di-nucleotide constraint, we analyzed RDF at the 2nd and the 3rd codon positions in simulated rpoB/rpoC sequences. The analysis revealed that selection on CUB is an important attribute for the constraint on di-nucleotides at these positions in bacterial genomes. We believe that this study has come with major findings of the role of CUB on di-nucleotide constraint in bacterial genomes.

Collapse

Selection on GGU and CGU codons in the high expression genes in bacteria. J Mol Evol 2013;78:13-23. [PMID: 24271854 DOI: 10.1007/s00239-013-9596-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2013] [Accepted: 11/11/2013] [Indexed: 12/22/2022]

Iwasaki Y, Abe T, Wada K, Wada Y, Ikemura T. A Novel Bioinformatics Strategy to Analyze Microbial Big Sequence Data for Efficient Knowledge Discovery: Batch-Learning Self-Organizing Map (BLSOM). Microorganisms 2013;1:137-157. [PMID: 27694768 PMCID: PMC5029494 DOI: 10.3390/microorganisms1010137] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2013] [Revised: 11/05/2013] [Accepted: 11/08/2013] [Indexed: 11/24/2022] Open

Sharma R, Ahlawat S, Maitra A, Roy M, Mandakmale S, Tantia MS. Polymorphism of BMP4 gene in Indian goat breeds differing in prolificacy. Gene 2013;532:140-5. [PMID: 24013084 DOI: 10.1016/j.gene.2013.08.086] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2013] [Revised: 07/22/2013] [Accepted: 08/26/2013] [Indexed: 10/26/2022]

Abstract

Bone morphogenetic proteins (BMPs) are members of the TGF-β (transforming growth factor-beta) superfamily, of which BMP4 is the most important due to its crucial role in follicular growth and differentiation, cumulus expansion and ovulation. Reproduction is a crucial trait in goat breeding and based on the important role of BMP4 gene in reproduction it was considered as a possible candidate gene for the prolificacy of goats. The objective of the present study was to detect polymorphism in intronic, exonic and 3' un-translated regions of BMP4 gene in Indian goats. Nine different goat breeds (Barbari, Beetal, Black Bengal, Malabari, Jakhrana (Twinning>40%), Osmanabadi, Sangamneri (Twinning 20-30%), Sirohi and Ganjam (Twinning<10%)) differing in prolificacy and geographic distribution were employed for polymorphism scanning. Cattle sequence (AC_000167.1) was used to design primers for the amplification of a targeted region followed by direct DNA sequencing to identify the genetic variations. Single nucleotide polymorphisms (SNPs) were not detected in exon 3, the intronic region and the 3' flanking region. A SNP (G1534A) was identified in exon 2. It was a non-synonymous mutation resulting in an arginine to lysine change in a corresponding protein sequence. G to A transition at the 1534 locus revealed two genotypes GG and GA in the nine investigated goat breeds. The GG genotype was predominant with a genotype frequency of 0.98. The GA genotype was present in the Black Bengal as well as Jakhrana breed with a genotype frequency of 0.02. A microsatellite was identified in the 3' flanking region, only 20 nucleotides downstream from the termination site of the coding region, as a short sequence with more than nineteen continuous and repeated CA dinucleotides. Since the gene is highly evolutionarily conserved, identification of a non-synonymous SNP (G1534A) in the coding region gains further importance. To our knowledge, this is the first report of a mutation in the coding region of the caprine BMP4 gene. But whether the reproduction trait of goat is associated with the BMP4 polymorphism, needs to be further defined by association studies in more populations so as to delineate an effect on it.

Collapse

Skewes AD, Welch RD. A Markovian analysis of bacterial genome sequence constraints. PeerJ 2013;1:e127. [PMID: 24010012 PMCID: PMC3757466 DOI: 10.7717/peerj.127] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2013] [Accepted: 07/18/2013] [Indexed: 11/20/2022] Open

Abstract

The arrangement of nucleotides within a bacterial chromosome is influenced by numerous factors. The degeneracy of the third codon within each reading frame allows some flexibility of nucleotide selection; however, the third nucleotide in the triplet of each codon is at least partly determined by the preceding two. This is most evident in organisms with a strong G + C bias, as the degenerate codon must contribute disproportionately to maintaining that bias. Therefore, a correlation exists between the first two nucleotides and the third in all open reading frames. If the arrangement of nucleotides in a bacterial chromosome is represented as a Markov process, we would expect that the correlation would be completely captured by a second-order Markov model and an increase in the order of the model (e.g., third-, fourth-…order) would not capture any additional uncertainty in the process. In this manuscript, we present the results of a comprehensive study of the Markov property that exists in the DNA sequences of 906 bacterial chromosomes. All of the 906 bacterial chromosomes studied exhibit a statistically significant Markov property that extends beyond second-order, and therefore cannot be fully explained by codon usage. An unrooted tree containing all 906 bacterial chromosomes based on their transition probability matrices of third-order shares ∼25% similarity to a tree based on sequence homologies of 16S rRNA sequences. This congruence to the 16S rRNA tree is greater than for trees based on lower-order models (e.g., second-order), and higher-order models result in diminishing improvements in congruence. A nucleotide correlation most likely exists within every bacterial chromosome that extends past three nucleotides. This correlation places significant limits on the number of nucleotide sequences that can represent probable bacterial chromosomes. Transition matrix usage is largely conserved by taxa, indicating that this property is likely inherited, however some important exceptions exist that may indicate the convergent evolution of some bacteria.

Collapse

Iwasaki Y, Abe T, Wada Y, Wada K, Ikemura T. Novel bioinformatics strategies for prediction of directional sequence changes in influenza virus genomes and for surveillance of potentially hazardous strains. BMC Infect Dis 2013;13:386. [PMID: 23964903 PMCID: PMC3765179 DOI: 10.1186/1471-2334-13-386] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2013] [Accepted: 08/05/2013] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

With the remarkable increase of microbial and viral sequence data obtained from high-throughput DNA sequencers, novel tools are needed for comprehensive analysis of the big sequence data. We have developed "Batch-Learning Self-Organizing Map (BLSOM)" which can characterize very many, even millions of, genomic sequences on one plane. Influenza virus is one of zoonotic viruses and shows clear host tropism. Important issues for bioinformatics studies of influenza viruses are prediction of genomic sequence changes in the near future and surveillance of potentially hazardous strains.

METHODS

To characterize sequence changes in influenza virus genomes after invasion into humans from other animal hosts, we applied BLSOMs to analyses of mono-, di-, tri-, and tetranucleotide compositions in all genome sequences of influenza A and B viruses and found clear host-dependent clustering (self-organization) of the sequences.

RESULTS

Viruses isolated from humans and birds differed in mononucleotide composition from each other. In addition, host-dependent oligonucleotide compositions that could not be explained with the host-dependent mononucleotide composition were revealed by oligonucleotide BLSOMs. Retrospective time-dependent directional changes of mono- and oligonucleotide compositions, which were visualized for human strains on BLSOMs, could provide predictive information about sequence changes in newly invaded viruses from other animal hosts (e.g. the swine-derived pandemic H1N1/09).

CONCLUSIONS

Basing on the host-dependent oligonucleotide composition, we proposed a strategy for prediction of directional changes of virus sequences and for surveillance of potentially hazardous strains when introduced into human populations from non-human sources. Millions of genomic sequences from infectious microbes and viruses have become available because of their medical and social importance, and BLSOM can characterize the big data and support efficient knowledge discovery.

Collapse

Iwasaki Y, Wada K, Wada Y, Abe T, Ikemura T. Notable clustering of transcription-factor-binding motifs in human pericentric regions and its biological significance. Chromosome Res 2013;21:461-74. [PMID: 23896648 PMCID: PMC3761090 DOI: 10.1007/s10577-013-9371-y] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2013] [Revised: 06/14/2013] [Accepted: 06/14/2013] [Indexed: 11/29/2022]

Salmonella utilizes D-glucosaminate via a mannose family phosphotransferase system permease and associated enzymes. J Bacteriol 2013;195:4057-66. [PMID: 23836865 DOI: 10.1128/jb.00290-13] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open

Alsop EB, Raymond J. Resolving prokaryotic taxonomy without rRNA: longer oligonucleotide word lengths improve genome and metagenome taxonomic classification. PLoS One 2013;8:e67337. [PMID: 23840870 PMCID: PMC3698125 DOI: 10.1371/journal.pone.0067337] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2013] [Accepted: 05/16/2013] [Indexed: 11/19/2022] Open

Abstract

Oligonucleotide signatures, especially tetranucleotide signatures, have been used as method for homology binning by exploiting an organism’s inherent biases towards the use of specific oligonucleotide words. Tetranucleotide signatures have been especially useful in environmental metagenomics samples as many of these samples contain organisms from poorly classified phyla which cannot be easily identified using traditional homology methods, including NCBI BLAST. This study examines oligonucleotide signatures across 1,424 completed genomes from across the tree of life, substantially expanding upon previous work. A comprehensive analysis of mononucleotide through nonanucleotide word lengths suggests that longer word lengths substantially improve the classification of DNA fragments across a range of sizes of relevance to high throughput sequencing. We find that, at present, heptanucleotide signatures represent an optimal balance between prediction accuracy and computational time for resolving taxonomy using both genomic and metagenomic fragments. We directly compare the ability of tetranucleotide and heptanucleotide world lengths (tetranucleotide signatures are the current standard for oligonucleotide word usage analyses) for taxonomic binning of metagenome reads. We present evidence that heptanucleotide word lengths consistently provide more taxonomic resolving power, particularly in distinguishing between closely related organisms that are often present in metagenomic samples. This implies that longer oligonucleotide word lengths should replace tetranucleotide signatures for most analyses. Finally, we show that the application of longer word lengths to metagenomic datasets leads to more accurate taxonomic binning of DNA scaffolds and have the potential to substantially improve taxonomic assignment and assembly of metagenomic data.

Collapse

Genome implosion elicits host-confinement in Alcaligenaceae: evidence from the comparative genomics of Tetrathiobacter kashmirensis, a pathogen in the making. PLoS One 2013;8:e64856. [PMID: 23741407 PMCID: PMC3669393 DOI: 10.1371/journal.pone.0064856] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2013] [Accepted: 04/19/2013] [Indexed: 11/24/2022] Open

Abstract

This study elucidates the genomic basis of the evolution of pathogens alongside free-living organisms within the family Alcaligenaceae of Betaproteobacteria. Towards that end, the complete genome sequence of the sulfur-chemolithoautotroph Tetrathiobacter kashmirensis WT001^T was determined and compared with the soil isolate Achromobacter xylosoxidans A8 and the two pathogens Bordetella bronchiseptica RB50 and Taylorella equigenitalis MCE9. All analyses comprehensively indicated that the RB50 and MCE9 genomes were almost the subsets of A8 and WT001^T, respectively. In the immediate evolutionary past Achromobacter and Bordetella shared a common ancestor, which was distinct from the other contemporary stock that gave rise to Tetrathiobacter and Taylorella. The Achromobacter-Bordetella precursor, after diverging from the family ancestor, evolved through extensive genome inflation, subsequent to which the two genera separated via differential gene losses and acquisitions. Tetrathiobacter, meanwhile, retained the core characteristics of the family ancestor, and Taylorella underwent massive genome degeneration to reach an evolutionary dead-end. Interestingly, the WT001^T genome, despite its conserved architecture, had only 85% coding density, besides which 578 out of its 4452 protein-coding sequences were found to be pseudogenized. Translational impairment of several DNA repair-recombination genes in the first place seemed to have ushered the rampant and indiscriminate frame-shift mutations across the WT001^T genome. Presumably, this strain has just come out of a recent evolutionary bottleneck, representing a unique transition state where genome self-degeneration has started comprehensively but selective host-confinement has not yet set in. In the light of this evolutionary link, host-adaptation of Taylorella clearly appears to be the aftereffect of genome implosion in another member of the same bottleneck. Remarkably again, potent virulence factors were found widespread in Alcaligenaceae, corroborating which hemolytic and mammalian cell-adhering abilities were discovered in WT001^T. So, while WT001^T relatives/derivatives in nature could be going the Taylorella way, the lineage as such was well-prepared for imminent host-confinement.

Collapse

Transfer RNA gene numbers may not be completely responsible for the codon usage bias in asparagine, isoleucine, phenylalanine, and tyrosine in the high expression genes in bacteria. J Mol Evol 2012;75:34-42. [PMID: 23053196 DOI: 10.1007/s00239-012-9524-1] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2012] [Accepted: 09/24/2012] [Indexed: 10/27/2022]

Akhter S, Aziz RK, Edwards RA. PhiSpy: a novel algorithm for finding prophages in bacterial genomes that combines similarity- and composition-based strategies. Nucleic Acids Res 2012;40:e126. [PMID: 22584627 PMCID: PMC3439882 DOI: 10.1093/nar/gks406] [Citation(s) in RCA: 350] [Impact Index Per Article: 26.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open

Dass JFP, Sudandiradoss C. Insight into pattern of codon biasness and nucleotide base usage in serotonin receptor gene family from different mammalian species. Gene 2012;503:92-100. [PMID: 22480817 DOI: 10.1016/j.gene.2012.03.057] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2011] [Revised: 03/14/2012] [Accepted: 03/17/2012] [Indexed: 11/16/2022]

NYEO SULONG, YU JUIPING. LENGTH DISTRIBUTIONS OF SIMPLE TANDEM REPEATS IN GENOMES. J BIOL SYST 2011. [DOI: 10.1142/s0218339007002246] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Uehara H, Iwasaki Y, Wada C, Ikemura T, Abe T. A novel bioinformatics strategy for searching industrially useful genome resources from metagenomic sequence libraries. Genes Genet Syst 2011;86:53-66. [PMID: 21498923 DOI: 10.1266/ggs.86.53] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open

Porceddu A, Camiolo S. Spatial analyses of mono, di and trinucleotide trends in plant genes. PLoS One 2011;6:e22855. [PMID: 21829660 PMCID: PMC3148226 DOI: 10.1371/journal.pone.0022855] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2011] [Accepted: 06/30/2011] [Indexed: 11/24/2022] Open

Abstract

Genomic DNA sequences display compositional heterogeneity on many scales. In this paper we analyzed tendencies and anomalies in the occurence of mono, di and trinucleotides in structural regions of plant genes. Representation of these trends as a function of position along genic sequences highlighted compositional features peculiar of either monocots or eudicots that were remarkably uniform within these two evolutionary clades. The most evident of these features appeared in the form of gradient of base content along the direction of transcription. The robustness of such a representation was validated in sequences sub-datasets generated considering structural and compositional features such as total length of cds, overall GC content and genic orientation in the genome. Piecewise regression analyses indicated that the gradients could be conveniently approximated to a two segmented model where a first region featuring a steep slope is followed by a second segment fitting a milder variation. In general, monocots species showed steeper segments than eudicots. The guanine gradient was the most distinctive feature between the two evolutionary clades, being moderately increasing in eudicots and firmly decreasing in monocots. Single gene investigation revealed that a high proportion of genes show compositional trends compatible with a segmented model suggesting that these features are essential attributes of gene organization. Dinucleotide and trinucleotide biases were referred to expectation based on a random union of the component elements. The average bias at dinucleotide level identified a significant undererpresentation of some dinucleotide and the overrepresention of others. The bias at trinucleotide level was on average low. Finally, the analysis of bryophyte coding sequences showed mononucleotide, dinucleotide and trinucleotide compositional trends resembling those of higher plants. This finding suggested that the emergenge of compositional bias is an ancient event in evolution which was already present at the time of land conquest by green plants.

Collapse

Epps J, Ying H, Huttley GA. Statistical methods for detecting periodic fragments in DNA sequence data. Biol Direct 2011;6:21. [PMID: 21527008 PMCID: PMC3111405 DOI: 10.1186/1745-6150-6-21] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2010] [Accepted: 04/28/2011] [Indexed: 11/10/2022] Open

Abstract

Background

Period 10 dinucleotides are structurally and functionally validated factors that influence the ability of DNA to form nucleosomes, histone core octamers. Robust identification of periodic signals in DNA sequences is therefore required to understand nucleosome organisation in genomes. While various techniques for identifying periodic components in genomic sequences have been proposed or adopted, the requirements for such techniques have not been considered in detail and confirmatory testing for a priori specified periods has not been developed.

Results

We compared the estimation accuracy and suitability for confirmatory testing of autocorrelation, discrete Fourier transform (DFT), integer period discrete Fourier transform (IPDFT) and a previously proposed Hybrid measure. A number of different statistical significance procedures were evaluated but a blockwise bootstrap proved superior. When applied to synthetic data whose period-10 signal had been eroded, or for which the signal was approximately period-10, the Hybrid technique exhibited superior properties during exploratory period estimation. In contrast, confirmatory testing using the blockwise bootstrap procedure identified IPDFT as having the greatest statistical power. These properties were validated on yeast sequences defined from a ChIP-chip study where the Hybrid metric confirmed the expected dominance of period-10 in nucleosome associated DNA but IPDFT identified more significant occurrences of period-10. Application to the whole genomes of yeast and mouse identified ~ 21% and ~ 19% respectively of these genomes as spanned by period-10 nucleosome positioning sequences (NPS).

Conclusions

For estimating the dominant period, we find the Hybrid period estimation method empirically to be the most effective for both eroded and approximate periodicity. The blockwise bootstrap was found to be effective as a significance measure, performing particularly well in the problem of period detection in the presence of eroded periodicity. The autocorrelation method was identified as poorly suited for use with the blockwise bootstrap. Application of our methods to the genomes of two model organisms revealed a striking proportion of the yeast and mouse genomes are spanned by NPS. Despite their markedly different sizes, roughly equivalent proportions (19-21%) of the genomes lie within period-10 spans of the NPS dinucleotides {AA, TT, TA}. The biological significance of these regions remains to be demonstrated. To facilitate this, the genomic coordinates are available as Additional files 1, 2, and 3 in a format suitable for visualisation as tracks on popular genome browsers.

Reviewers

This article was reviewed by Prof Tomas Radivoyevitch, Dr Vsevolod Makeev (nominated by Dr Mikhail Gelfand), and Dr Rob D Knight.

Collapse

Iwasaki Y, Abe T, Wada K, Itoh M, Ikemura T. Prediction of directional changes of influenza A virus genome sequences with emphasis on pandemic H1N1/09 as a model case. DNA Res 2011;18:125-36. [PMID: 21444341 PMCID: PMC3077041 DOI: 10.1093/dnares/dsr005] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open

Fang X, Du Y, Zhang C, Shi X, Chen D, Sun J, Jin Q, Lan X, Chen H. Polymorphism in a microsatellite of the acrp30 gene and its association with growth traits in goats. Biochem Genet 2011;49:533-9. [PMID: 21369822 DOI: 10.1007/s10528-011-9428-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2010] [Accepted: 12/06/2010] [Indexed: 11/30/2022]

Visualization of sequence and structural features of genomes and chromosome fragments. Application to CpG islands, Alu sequences and whole genomes. Gene X 2011;473:76-81. [DOI: 10.1016/j.gene.2010.11.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2010] [Revised: 11/24/2010] [Accepted: 11/24/2010] [Indexed: 11/20/2022] Open

Garcia SP, Pinho AJ, Rodrigues JMOS, Bastos CAC, Ferreira PJSG. Minimal absent words in prokaryotic and eukaryotic genomes. PLoS One 2011;6:e16065. [PMID: 21386877 PMCID: PMC3031530 DOI: 10.1371/journal.pone.0016065] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2010] [Accepted: 12/04/2010] [Indexed: 11/21/2022] Open

Delaye L, González-Domenech CM, Garcillán-Barcia MP, Peretó J, de la Cruz F, Moya A. Blueprint for a minimal photoautotrophic cell: conserved and variable genes in Synechococcus elongatus PCC 7942. BMC Genomics 2011;12:25. [PMID: 21226929 PMCID: PMC3025956 DOI: 10.1186/1471-2164-12-25] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2010] [Accepted: 01/12/2011] [Indexed: 02/07/2023] Open

Plotkin JB, Kudla G. Synonymous but not the same: the causes and consequences of codon bias. Nat Rev Genet 2011;12:32-42. [PMID: 21102527 PMCID: PMC3074964 DOI: 10.1038/nrg2899] [Citation(s) in RCA: 1059] [Impact Index Per Article: 75.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]

Practical application of self-organizing maps to interrelate biodiversity and functional data in NGS-based metagenomics. ISME JOURNAL 2010;5:918-28. [PMID: 21160538 DOI: 10.1038/ismej.2010.180] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]

Abstract

Next-generation sequencing (NGS) technologies have enabled the application of broad-scale sequencing in microbial biodiversity and metagenome studies. Biodiversity is usually targeted by classifying 16S ribosomal RNA genes, while metagenomic approaches target metabolic genes. However, both approaches remain isolated, as long as the taxonomic and functional information cannot be interrelated. Techniques like self-organizing maps (SOMs) have been applied to cluster metagenomes into taxon-specific bins in order to link biodiversity with functions, but have not been applied to broad-scale NGS-based metagenomics yet. Here, we provide a novel implementation, demonstrate its potential and practicability, and provide a web-based service for public usage. Evaluation with published data sets mimicking varyingly complex habitats resulted into classification specificities and sensitivities of close to 100% to above 90% from phylum to genus level for assemblies exceeding 8 kb for low and medium complexity data. When applied to five real-world metagenomes of medium complexity from direct pyrosequencing of marine subsurface waters, classifications of assemblies above 2.5 kb were in good agreement with fluorescence in situ hybridizations, indicating that biodiversity was mostly retained within the metagenomes, and confirming high classification specificities. This was validated by two protein-based classifications (PBCs) methods. SOMs were able to retrieve the relevant taxa down to the genus level, while surpassing PBCs in resolution. In order to make the approach accessible to a broad audience, we implemented a feature-rich web-based SOM application named TaxSOM, which is freely available at http://www.megx.net/toolbox/taxsom. TaxSOM can classify reads or assemblies exceeding 2.5 kb with high accuracy and thus assists in linking biodiversity and functions in metagenome studies, which is a precondition to study microbial ecology in a holistic fashion.

Collapse

Zhang Z, Yu J. Modeling compositional dynamics based on GC and purine contents of protein-coding sequences. Biol Direct 2010;5:63. [PMID: 21059261 PMCID: PMC2989939 DOI: 10.1186/1745-6150-5-63] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2010] [Accepted: 11/08/2010] [Indexed: 12/03/2022] Open

Liang H, Barakat A, Schlarbaum SE, Mandoli DF, Carlson JE. Comparison of gene order of GIGANTEA loci in yellow-poplar, monocots, and eudicots. Genome 2010;53:533-44. [PMID: 20616875 DOI: 10.1139/g10-031] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]

Cuff WR, Duvvuri VRSK, Liang B, Duvvuri B, Wu GE, Wu J, Tsang RSW. A novel interpretation of structural dot plots of genomes derived from the analysis of two strains of Neisseria meningitidis. GENOMICS PROTEOMICS & BIOINFORMATICS 2010;8:159-69. [PMID: 20970744 PMCID: PMC5054114 DOI: 10.1016/s1672-0229(10)60018-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

Mitrofanov SI, Panchin AY, Spirin SA, Alexeevski AV, Panchin YV. Exclusive sequences of different genomes. J Bioinform Comput Biol 2010;8:519-34. [PMID: 20556860 DOI: 10.1142/s0219720010004719] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2009] [Revised: 12/31/2009] [Accepted: 01/16/2010] [Indexed: 11/18/2022]

Du H, Hu H, Meng Y, Zheng W, Ling F, Wang J, Zhang X, Nie Q, Wang X. The correlation coefficient of GC content of the genome-wide genes is positively correlated with animal evolutionary relationships. FEBS Lett 2010;584:3990-3994. [PMID: 20691688 DOI: 10.1016/j.febslet.2010.08.003] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2010] [Revised: 07/29/2010] [Accepted: 08/02/2010] [Indexed: 11/16/2022]

Tse H, Cai JJ, Tsoi HW, Lam EP, Yuen KY. Natural selection retains overrepresented out-of-frame stop codons against frameshift peptides in prokaryotes. BMC Genomics 2010;11:491. [PMID: 20828396 PMCID: PMC2996987 DOI: 10.1186/1471-2164-11-491] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2010] [Accepted: 09/09/2010] [Indexed: 12/03/2022] Open

Abstract

Background

Out-of-frame stop codons (OSCs) occur naturally in coding sequences of all organisms, providing a mechanism of early termination of translation in incorrect reading frame so that the metabolic cost associated with frameshift events can be reduced. Given such a functional significance, we expect statistically overrepresented OSCs in coding sequences as a result of a widespread selection. Accordingly, we examined available prokaryotic genomes to look for evidence of this selection.

Results

The complete genome sequences of 990 prokaryotes were obtained from NCBI GenBank. We found that low G+C content coding sequences contain significantly more OSCs and G+C content at specific codon positions were the principal determinants of OSC usage bias in the different reading frames. To investigate if there is overrepresentation of OSCs, we modeled the trinucleotide and hexanucleotide biases of the coding sequences using Markov models, and calculated the expected OSC frequencies for each organism using a Monte Carlo approach. More than 93% of 342 phylogenetically representative prokaryotic genomes contain excess OSCs. Interestingly the degree of OSC overrepresentation correlates positively with G+C content, which may represent a compensatory mechanism for the negative correlation of OSC frequency with G+C content. We extended the analysis using additional compositional bias models and showed that lower-order bias like codon usage and dipeptide bias could not explain the OSC overrepresentation. The degree of OSC overrepresentation was found to correlate negatively with the optimal growth temperature of the organism after correcting for the G+C% and AT skew of the coding sequence.

Conclusions

The present study uses approaches with statistical rigor to show that OSC overrepresentation is a widespread phenomenon among prokaryotes. Our results support the hypothesis that OSCs carry functional significance and have been selected in the course of genome evolution to act against unintended frameshift occurrences. Some results also hint that OSC overrepresentation being a compensatory mechanism to make up for the decrease in OSCs in high G+C organisms, thus revealing the interplay between two different determinants of OSC frequency.

Collapse

Tyagi A, Bag SK, Shukla V, Roy S, Tuli R. Oligonucleotide frequencies of barcoding loci can discriminate species across kingdoms. PLoS One 2010;5:e12330. [PMID: 20808837 PMCID: PMC2924895 DOI: 10.1371/journal.pone.0012330] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2010] [Accepted: 07/28/2010] [Indexed: 12/04/2022] Open

Fox JM, Erill I. Relative codon adaptation: a generic codon bias index for prediction of gene expression. DNA Res 2010;17:185-96. [PMID: 20453079 PMCID: PMC2885275 DOI: 10.1093/dnares/dsq012] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Xing-Tang F, Hai-Xia X, Hong C, Chun-Lei Z, Xiu-Cai H, Xue-Yuan G, Chuan-Wen G, Wang-Ping Y, Xian-Yong L. Polymorphisms of Bone Morphogenetic Protein 4 (BMP4) Gene in Goats. ACTA ACUST UNITED AC 2010. [DOI: 10.3923/javaa.2010.907.912] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]

Zhong X, Zan L, Wang H, Liu Y. Polymorphic CA microsatellites in the third exon of the bovine BMP4 gene. GENETICS AND MOLECULAR RESEARCH 2010;9:868-74. [DOI: 10.4238/vol9-2gmr732] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]

Kielak A, Rodrigues JL, Kuramae EE, Chain PS, Van Veen JA, Kowalchuk GA. Phylogenetic and metagenomic analysis of Verrucomicrobiaâin former âagricultural grassland soil. FEMS Microbiol Ecol 2010;71:23-33. [DOI: 10.1111/j.1574-6941.2009.00785.x] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open

Gatherer D. Peptide vocabulary analysis reveals ultra-conservation and homonymity in protein sequences. Bioinform Biol Insights 2009;1:101-26. [PMID: 20066129 PMCID: PMC2789693 DOI: 10.4137/bbi.s415] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open

Prakash A, Shepard SS, He J, Hart B, Chen M, Amarachintha SP, Mileyeva-Biebesheimer O, Bechtel J, Fedorov A. Evolution of genomic sequence inhomogeneity at mid-range scales. BMC Genomics 2009;10:513. [PMID: 19891785 PMCID: PMC2779198 DOI: 10.1186/1471-2164-10-513] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2009] [Accepted: 11/05/2009] [Indexed: 01/01/2023] Open

Abstract

BACKGROUND

Mid-range inhomogeneity or MRI is the significant enrichment of particular nucleotides in genomic sequences extending from 30 up to several thousands of nucleotides. The best-known manifestation of MRI is CpG islands representing CG-rich regions. Recently it was demonstrated that MRI could be observed not only for G+C content but also for all other nucleotide pairings (e.g. A+G and G+T) as well as for individual bases. Various types of MRI regions are 4-20 times enriched in mammalian genomes compared to their occurrences in random models.

RESULTS

This paper explores how different types of mutations change MRI regions. Human, chimpanzee and Macaca mulatta genomes were aligned to study the projected effects of substitutions and indels on human sequence evolution within both MRI regions and control regions of average nucleotide composition. Over 18.8 million fixed point substitutions, 3.9 million SNPs, and indels spanning 6.9 Mb were procured and evaluated in human. They include 1.8 Mb substitutions and 1.9 Mb indels within MRI regions. Ancestral and mutant (derived) alleles for substitutions have been determined. Substitutions were grouped according to their fixation within human populations: fixed substitutions (from the human-chimp-macaca alignment), major SNPs (> 80% mutant allele frequency within humans), medium SNPs (20% - 80% mutant allele frequency), minor SNPs (3% - 20%), and rare SNPs (<3%). Data on short (< 3 bp) and medium-length (3 - 50 bp) insertions and deletions within MRI regions and appropriate control regions were analyzed for the effect of indels on the expansion or diminution of such regions as well as on changing nucleotide composition.

CONCLUSION

MRI regions have comparable levels of de novo mutations to the control genomic sequences with average base composition. De novo substitutions rapidly erode MRI regions, bringing their nucleotide composition toward genome-average levels. However, those substitutions that favor the maintenance of MRI properties have a higher chance to spread through the entire population. Indels have a clear tendency to maintain MRI features yet they have a smaller impact than substitutions. All in all, the observed fixation bias for mutations helps to preserve MRI regions during evolution.

Collapse

100

Freilich S, Goldovsky L, Gottlieb A, Blanc E, Tsoka S, Ouzounis CA. Stratification of co-evolving genomic groups using ranked phylogenetic profiles. BMC Bioinformatics 2009;10:355. [PMID: 19860884 PMCID: PMC2775751 DOI: 10.1186/1471-2105-10-355] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2009] [Accepted: 10/27/2009] [Indexed: 01/12/2023] Open

Abstract

BACKGROUND

Previous methods of detecting the taxonomic origins of arbitrary sequence collections, with a significant impact to genome analysis and in particular metagenomics, have primarily focused on compositional features of genomes. The evolutionary patterns of phylogenetic distribution of genes or proteins, represented by phylogenetic profiles, provide an alternative approach for the detection of taxonomic origins, but typically suffer from low accuracy. Herein, we present rank-BLAST, a novel approach for the assignment of protein sequences into genomic groups of the same taxonomic origin, based on the ranking order of phylogenetic profiles of target genes or proteins across the reference database.

RESULTS

The rank-BLAST approach is validated by computing the phylogenetic profiles of all sequences for five distinct microbial species of varying degrees of phylogenetic proximity, against a reference database of 243 fully sequenced genomes. The approach - a combination of sequence searches, statistical estimation and clustering - analyses the degree of sequence divergence between sets of protein sequences and allows the classification of protein sequences according to the species of origin with high accuracy, allowing taxonomic classification of 64% of the proteins studied. In most cases, a main cluster is detected, representing the corresponding species. Secondary, functionally distinct and species-specific clusters exhibit different patterns of phylogenetic distribution, thus flagging gene groups of interest. Detailed analyses of such cases are provided as examples.

CONCLUSION

Our results indicate that the rank-BLAST approach can capture the taxonomic origins of sequence collections in an accurate and efficient manner. The approach can be useful both for the analysis of genome evolution and the detection of species groups in metagenomics samples.

Collapse