1
|
A new family of transcriptional regulators of tungstoenzymes and molybdate/tungstate transport. Environ Microbiol 2019; 21:784-799. [PMID: 30536693 DOI: 10.1111/1462-2920.14500] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2018] [Revised: 12/03/2018] [Accepted: 12/07/2018] [Indexed: 11/30/2022]
Abstract
Bacterial genes for molybdenum-containing and tungsten-containing enzymes are often differentially regulated depending on the metal availability in the environment. Here, we describe a new family of transcription factors with an unusual DNA-binding domain related to excisionases of bacteriophages. These transcription factors are associated with genes for various molybdate and tungstate-specific transporting systems as well as molybdo/tungsto-enzymes in a wide range of bacterial genomes. We used a combination of computational and experimental techniques to study a member of the TF family, named TaoR (for tungsten-containing aldehyde oxidoreductase regulator). In Desulfovibrio vulgaris Hildenborough, a model bacterium for sulfate reduction studies, TaoR activates expression of aldehyde oxidoreductase aor and represses tungsten-specific ABC-type transporter tupABC genes under tungsten-replete conditions. TaoR binding sites at aor promoter were identified by electrophoretic mobility shift assay and DNase I footprinting. We also reconstructed TaoR regulons in 45 Deltaproteobacteria by comparative genomics approach and predicted target genes for TaoR family members in other Proteobacteria and Firmicutes.
Collapse
|
2
|
|
3
|
Abstract
Here we analyse genetic variation, population structure and diversity among 3,010 diverse Asian cultivated rice (Oryza sativa L.) genomes from the 3,000 Rice Genomes Project. Our results are consistent with the five major groups previously recognized, but also suggest several unreported subpopulations that correlate with geographic location. We identified 29 million single nucleotide polymorphisms, 2.4 million small indels and over 90,000 structural variations that contribute to within- and between-population variation. Using pan-genome analyses, we identified more than 10,000 novel full-length protein-coding genes and a high number of presence-absence variations. The complex patterns of introgression observed in domestication genes are consistent with multiple independent rice domestication events. The public availability of data from the 3,000 Rice Genomes Project provides a resource for rice genomics research and breeding.
Collapse
|
4
|
Genomic variation in 3,010 diverse accessions of Asian cultivated rice. Nature 2018; 557:43-49. [PMID: 29695866 PMCID: PMC6784863 DOI: 10.1038/s41586-018-0063-9] [Citation(s) in RCA: 753] [Impact Index Per Article: 125.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2016] [Accepted: 02/28/2018] [Indexed: 01/27/2023]
Abstract
Here we analyse genetic variation, population structure and diversity among 3,010 diverse Asian cultivated rice (Oryza sativa L.) genomes from the 3,000 Rice Genomes Project. Our results are consistent with the five major groups previously recognized, but also suggest several unreported subpopulations that correlate with geographic location. We identified 29 million single nucleotide polymorphisms, 2.4 million small indels and over 90,000 structural variations that contribute to within- and between-population variation. Using pan-genome analyses, we identified more than 10,000 novel full-length protein-coding genes and a high number of presence–absence variations. The complex patterns of introgression observed in domestication genes are consistent with multiple independent rice domestication events. The public availability of data from the 3,000 Rice Genomes Project provides a resource for rice genomics research and breeding. Analyses of genetic variation and population structure based on over 3,000 cultivated rice (Oryza sativa) genomes reveal subpopulations that correlate with geographic location and patterns of introgression consistent with multiple rice domestication events.
Collapse
|
5
|
Strategic Integration of Multiple Bioinformatics Resources for System Level Analysis of Biological Networks. Methods Mol Biol 2017; 1613:85-99. [PMID: 28849559 DOI: 10.1007/978-1-4939-7027-8_5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Recent technological advances in genomics allow the production of biological data at unprecedented tera- and petabyte scales. Efficient mining of these vast and complex datasets for the needs of biomedical research critically depends on a seamless integration of the clinical, genomic, and experimental information with prior knowledge about genotype-phenotype relationships. Such experimental data accumulated in publicly available databases should be accessible to a variety of algorithms and analytical pipelines that drive computational analysis and data mining.We present an integrated computational platform Lynx (Sulakhe et al., Nucleic Acids Res 44:D882-D887, 2016) ( http://lynx.cri.uchicago.edu ), a web-based database and knowledge extraction engine. It provides advanced search capabilities and a variety of algorithms for enrichment analysis and network-based gene prioritization. It gives public access to the Lynx integrated knowledge base (LynxKB) and its analytical tools via user-friendly web services and interfaces. The Lynx service-oriented architecture supports annotation and analysis of high-throughput experimental data. Lynx tools assist the user in extracting meaningful knowledge from LynxKB and experimental data, and in the generation of weighted hypotheses regarding the genes and molecular mechanisms contributing to human phenotypes or conditions of interest. The goal of this integrated platform is to support the end-to-end analytical needs of various translational projects.
Collapse
|
6
|
Rice SNP-seek database update: new SNPs, indels, and queries. Nucleic Acids Res 2016; 45:D1075-D1081. [PMID: 27899667 PMCID: PMC5210592 DOI: 10.1093/nar/gkw1135] [Citation(s) in RCA: 165] [Impact Index Per Article: 20.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2016] [Revised: 10/14/2016] [Accepted: 11/04/2016] [Indexed: 11/16/2022] Open
Abstract
We describe updates to the Rice SNP-Seek Database since its first release. We ran a new SNP-calling pipeline followed by filtering that resulted in complete, base, filtered and core SNP datasets. Besides the Nipponbare reference genome, the pipeline was run on genome assemblies of IR 64, 93-11, DJ 123 and Kasalath. New genotype query and display features are added for reference assemblies, SNP datasets and indels. JBrowse now displays BAM, VCF and other annotation tracks, the additional genome assemblies and an embedded VISTA genome comparison viewer. Middleware is redesigned for improved performance by using a hybrid of HDF5 and RDMS for genotype storage. Query modules for genotypes, varieties and genes are improved to handle various constraints. An integrated list manager allows the user to pass query parameters for further analysis. The SNP Annotator adds traits, ontology terms, effects and interactions to markers in a list. Web-service calls were implemented to access most data. These features enable seamless querying of SNP-Seek across various biological entities, a step toward semi-automated gene-trait association discovery. URL: http://snp-seek.irri.org.
Collapse
|
7
|
σ54-dependent regulome in Desulfovibrio vulgaris Hildenborough. BMC Genomics 2015; 16:919. [PMID: 26555820 PMCID: PMC4641369 DOI: 10.1186/s12864-015-2176-y] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2015] [Accepted: 10/31/2015] [Indexed: 11/16/2022] Open
Abstract
Background The σ54 subunit controls a unique class of promoters in bacteria. Such promoters, without exception, require enhancer binding proteins (EBPs) for transcription initiation. Desulfovibrio vulgaris Hildenborough, a model bacterium for sulfate reduction studies, has a high number of EBPs, more than most sequenced bacteria. The cellular processes regulated by many of these EBPs remain unknown. Results To characterize the σ54-dependent regulome of D. vulgaris Hildenborough, we identified EBP binding motifs and regulated genes by a combination of computational and experimental techniques. These predictions were supported by our reconstruction of σ54-dependent promoters by comparative genomics. We reassessed and refined the results of earlier studies on regulation in D. vulgaris Hildenborough and consolidated them with our new findings. It allowed us to reconstruct the σ54 regulome in D. vulgaris Hildenborough. This regulome includes 36 regulons that consist of 201 coding genes and 4 non-coding RNAs, and is involved in nitrogen, carbon and energy metabolism, regulation, transmembrane transport and various extracellular functions. To the best of our knowledge, this is the first report of direct regulation of alanine dehydrogenase, pyruvate metabolism genes and type III secretion system by σ54-dependent regulators. Conclusions The σ54-dependent regulome is an important component of transcriptional regulatory network in D. vulgaris Hildenborough and related free-living Deltaproteobacteria. Our study provides a representative collection of σ54-dependent regulons that can be used for regulation prediction in Deltaproteobacteria and other taxa. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-2176-y) contains supplementary material, which is available to authorized users.
Collapse
|
8
|
Functionally conserved enhancers with divergent sequences in distant vertebrates. BMC Genomics 2015; 16:882. [PMID: 26519295 PMCID: PMC4628251 DOI: 10.1186/s12864-015-2070-7] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2015] [Accepted: 10/13/2015] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND To examine the contributions of sequence and function conservation in the evolution of enhancers, we systematically identified enhancers whose sequences are not conserved among distant groups of vertebrate species, but have homologous function and are likely to be derived from a common ancestral sequence. Our approach combined comparative genomics and epigenomics to identify potential enhancer sequences in the genomes of three groups of distantly related vertebrate species. RESULTS We searched for sequences that were conserved within groups of closely related species but not between groups of more distant species, and were associated with an epigenetic mark of enhancer activity. To facilitate inferring orthology between non-conserved sequences, we limited our search to introns whose orthology could be unambiguously established by mapping the bracketing exons. We show that a subset of these non-conserved but syntenic sequences from the mouse and zebrafish genomes have homologous functions in a zebrafish transgenic enhancer assay. The conserved expression patterns driven by these enhancers are probably associated with short transcription factor-binding motifs present in the divergent sequences. CONCLUSIONS We have identified numerous potential enhancers with divergent sequences but a conserved function. These results indicate that selection on function, rather than sequence, may be a common mode of enhancer evolution; evidence for selection at the sequence level is not a necessary criterion to define a gene regulatory element.
Collapse
|
9
|
Elviz - exploration of metagenome assemblies with an interactive visualization tool. BMC Bioinformatics 2015; 16:130. [PMID: 25928663 PMCID: PMC4432942 DOI: 10.1186/s12859-015-0566-4] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2015] [Accepted: 04/08/2015] [Indexed: 12/02/2022] Open
Abstract
Background Metagenomics, the sequencing of DNA collected from an entire microbial community, enables the study of natural microbial consortia in their native habitats. Metagenomics studies produce huge volumes of data, including both the sequences themselves and metadata describing their abundance, assembly, predicted functional characteristics and environmental parameters. The ability to explore these data visually is critically important to meaningful biological interpretation. Current genomics applications cannot effectively integrate sequence data, assembly metadata, and annotation to support both genome and community-level inquiry. Results Elviz (Environmental Laboratory Visualization) is an interactive web-based tool for the visual exploration of assembled metagenomes and their complex metadata. Elviz allows scientists to navigate metagenome assemblies across multiple dimensions and scales, plotting parameters such as GC content, relative abundance, phylogenetic affiliation and assembled contig length. Furthermore Elviz enables interactive exploration using real-time plot navigation, search, filters, axis selection, and the ability to drill from a whole-community profile down to individual gene annotations. Thus scientists engage in a rapid feedback loop of visual pattern identification, hypothesis generation, and hypothesis testing. Conclusions Compared to the current alternative of generating a succession of static figures, Elviz can greatly accelerate the speed of metagenome analysis. Elviz can be used to explore both user-submitted datasets and numerous metagenome studies publicly available at the Joint Genome Institute (JGI). Elviz is freely available at http://genome.jgi.doe.gov/viz and runs on most current web-browsers. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0566-4) contains supplementary material, which is available to authorized users.
Collapse
|
10
|
An integrative computational approach for prioritization of genomic variants. PLoS One 2014; 9:e114903. [PMID: 25506935 PMCID: PMC4266634 DOI: 10.1371/journal.pone.0114903] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2014] [Accepted: 11/15/2014] [Indexed: 12/27/2022] Open
Abstract
An essential step in the discovery of molecular mechanisms contributing to disease phenotypes and efficient experimental planning is the development of weighted hypotheses that estimate the functional effects of sequence variants discovered by high-throughput genomics. With the increasing specialization of the bioinformatics resources, creating analytical workflows that seamlessly integrate data and bioinformatics tools developed by multiple groups becomes inevitable. Here we present a case study of a use of the distributed analytical environment integrating four complementary specialized resources, namely the Lynx platform, VISTA RViewer, the Developmental Brain Disorders Database (DBDB), and the RaptorX server, for the identification of high-confidence candidate genes contributing to pathogenesis of spina bifida. The analysis resulted in prediction and validation of deleterious mutations in the SLC19A placental transporter in mothers of the affected children that causes narrowing of the outlet channel and therefore leads to the reduced folate permeation rate. The described approach also enabled correct identification of several genes, previously shown to contribute to pathogenesis of spina bifida, and suggestion of additional genes for experimental validations. The study demonstrates that the seamless integration of bioinformatics resources enables fast and efficient prioritization and characterization of genomic factors and molecular networks contributing to the phenotypes of interest.
Collapse
|
11
|
GenomeVISTA--an integrated software package for whole-genome alignment and visualization. ACTA ACUST UNITED AC 2014; 30:2654-5. [PMID: 24860159 DOI: 10.1093/bioinformatics/btu355] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
UNLABELLED With the ubiquitous generation of complete genome assemblies for a variety of species, efficient tools for whole-genome alignment along with user-friendly visualization are critically important. Our VISTA family of tools for comparative genomics, based on algorithms for pairwise and multiple alignments of genomic sequences and whole-genome assemblies, has become one of the standard techniques for comparative analysis. Most of the VISTA programs have been implemented as Web-accessible servers and are extensively used by the biomedical community. In this manuscript, we introduce GenomeVISTA: a novel implementation that incorporates most features of the VISTA family--fast and accurate alignment, visualization capabilities, GUI and analytical tools within a stand-alone software package. GenomeVISTA thus provides flexibility and security for users who need to conduct whole-genome comparisons on their own computers. AVAILABILITY AND IMPLEMENTATION Implemented in Perl, C/C++ and Java, the source code is freely available for download at the VISTA Web site: http://genome.lbl.gov/vista/.
Collapse
|
12
|
Abstract
MycoCosm is a fungal genomics portal (http://jgi.doe.gov/fungi), developed by the US Department of Energy Joint Genome Institute to support integration, analysis and dissemination of fungal genome sequences and other ‘omics’ data by providing interactive web-based tools. MycoCosm also promotes and facilitates user community participation through the nomination of new species of fungi for sequencing, and the annotation and analysis of resulting data. By efficiently filling gaps in the Fungal Tree of Life, MycoCosm will help address important problems associated with energy and the environment, taking advantage of growing fungal genomics resources.
Collapse
|
13
|
Abstract
The U.S. Department of Energy (DOE) Joint Genome Institute (JGI), a national user facility, serves the diverse scientific community by providing integrated high-throughput sequencing and computational analysis to enable system-based scientific approaches in support of DOE missions related to clean energy generation and environmental characterization. The JGI Genome Portal (http://genome.jgi.doe.gov) provides unified access to all JGI genomic databases and analytical tools. The JGI maintains extensive data management systems and specialized analytical capabilities to manage and interpret complex genomic data. A user can search, download and explore multiple data sets available for all DOE JGI sequencing projects including their status, assemblies and annotations of sequenced genomes. Here we describe major updates of the Genome Portal in the past 2 years with a specific emphasis on efficient handling of the rapidly growing amount of diverse genomic data accumulated in JGI.
Collapse
|
14
|
RegPrecise 3.0--a resource for genome-scale exploration of transcriptional regulation in bacteria. BMC Genomics 2013; 14:745. [PMID: 24175918 PMCID: PMC3840689 DOI: 10.1186/1471-2164-14-745] [Citation(s) in RCA: 265] [Impact Index Per Article: 24.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2013] [Accepted: 10/28/2013] [Indexed: 11/27/2022] Open
Abstract
Background Genome-scale prediction of gene regulation and reconstruction of transcriptional regulatory networks in prokaryotes is one of the critical tasks of modern genomics. Bacteria from different taxonomic groups, whose lifestyles and natural environments are substantially different, possess highly diverged transcriptional regulatory networks. The comparative genomics approaches are useful for in silico reconstruction of bacterial regulons and networks operated by both transcription factors (TFs) and RNA regulatory elements (riboswitches). Description RegPrecise (http://regprecise.lbl.gov) is a web resource for collection, visualization and analysis of transcriptional regulons reconstructed by comparative genomics. We significantly expanded a reference collection of manually curated regulons we introduced earlier. RegPrecise 3.0 provides access to inferred regulatory interactions organized by phylogenetic, structural and functional properties. Taxonomy-specific collections include 781 TF regulogs inferred in more than 160 genomes representing 14 taxonomic groups of Bacteria. TF-specific collections include regulogs for a selected subset of 40 TFs reconstructed across more than 30 taxonomic lineages. Novel collections of regulons operated by RNA regulatory elements (riboswitches) include near 400 regulogs inferred in 24 bacterial lineages. RegPrecise 3.0 provides four classifications of the reference regulons implemented as controlled vocabularies: 55 TF protein families; 43 RNA motif families; ~150 biological processes or metabolic pathways; and ~200 effectors or environmental signals. Genome-wide visualization of regulatory networks and metabolic pathways covered by the reference regulons are available for all studied genomes. A separate section of RegPrecise 3.0 contains draft regulatory networks in 640 genomes obtained by an conservative propagation of the reference regulons to closely related genomes. Conclusions RegPrecise 3.0 gives access to the transcriptional regulons reconstructed in bacterial genomes. Analytical capabilities include exploration of: regulon content, structure and function; TF binding site motifs; conservation and variations in genome-wide regulatory networks across all taxonomic groups of Bacteria. RegPrecise 3.0 was selected as a core resource on transcriptional regulation of the Department of Energy Systems Biology Knowledgebase, an emerging software and data environment designed to enable researchers to collaboratively generate, test and share new hypotheses about gene and protein functions, perform large-scale analyses, and model interactions in microbes, plants, and their communities.
Collapse
|
15
|
Whole-Genome rVISTA: a tool to determine enrichment of transcription factor binding sites in gene promoters from transcriptomic data. ACTA ACUST UNITED AC 2013; 29:2059-61. [PMID: 23736530 DOI: 10.1093/bioinformatics/btt318] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
SUMMARY We have developed a web-based query tool, Whole-Genome rVISTA (WGRV), that determines enrichment of transcription factors (TFs) and associated target genes in sets of co-regulated genes. WGRV enables users to query databases containing pre-computed genome coordinates of evolutionarily conserved transcription factor binding sites in the proximal promoters (from 100 bp to 5 kb upstream) of human, mouse and Drosophila genomes. TF binding sites are based on position-weight matrices from the TRANSFAC Professional database. For a given set of co-regulated genes, WGRV returns statistically enriched and evolutionarily conserved binding sites, mapped by the regulatory VISTA (rVISTA) algorithm. Users can then retrieve a list of genes from the query set containing the enriched TF binding sites and their location in the query set promoters. Results are exported in a BED format for rapid visualization in the UCSC genome browser. Flat files of mapped conserved sites and their genomic coordinates are also available for analysis with stand-alone software. AVAILABILITY http://genome.lbl.gov/cgi-bin/WGRVistaInputCommon.pl.
Collapse
|
16
|
RegTransBase--a database of regulatory sequences and interactions based on literature: a resource for investigating transcriptional regulation in prokaryotes. BMC Genomics 2013; 14:213. [PMID: 23547897 PMCID: PMC3639892 DOI: 10.1186/1471-2164-14-213] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2012] [Accepted: 03/22/2013] [Indexed: 11/10/2022] Open
Abstract
Background Due to the constantly growing number of sequenced microbial genomes, comparative genomics has been playing a major role in the investigation of regulatory interactions in bacteria. Regulon inference mostly remains a field of semi-manual examination since absence of a knowledgebase and informatics platform for automated and systematic investigation restricts opportunities for computational prediction. Additionally, confirming computationally inferred regulons by experimental data is critically important. Description RegTransBase is an open-access platform with a user-friendly web interface publicly available at http://regtransbase.lbl.gov. It consists of two databases – a manually collected hierarchical regulatory interactions database based on more than 7000 scientific papers which can serve as a knowledgebase for verification of predictions, and a large set of curated by experts transcription factor binding sites used in regulon inference by a variety of tools. RegTransBase captures the knowledge from published scientific literature using controlled vocabularies and contains various types of experimental data, such as: the activation or repression of transcription by an identified direct regulator; determination of the transcriptional regulatory function of a protein (or RNA) directly binding to DNA or RNA; mapping of binding sites for a regulatory protein; characterization of regulatory mutations. Analysis of the data collected from literature resulted in the creation of Putative Regulons from Experimental Data that are also available in RegTransBase. Conclusions RegTransBase is a powerful user-friendly platform for the investigation of regulation in prokaryotes. It uses a collection of validated regulatory sequences that can be easily extracted and used to infer regulatory interactions by comparative genomics techniques thus assisting researchers in the interpretation of transcriptional regulation data.
Collapse
|
17
|
RegPrecise web services interface: programmatic access to the transcriptional regulatory interactions in bacteria reconstructed by comparative genomics. Nucleic Acids Res 2012; 40:W604-8. [PMID: 22700702 PMCID: PMC3394319 DOI: 10.1093/nar/gks562] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Web services application programming interface (API) was developed to provide a programmatic access to the regulatory interactions accumulated in the RegPrecise database (http://regprecise.lbl.gov), a core resource on transcriptional regulation for the microbial domain of the Department of Energy (DOE) Systems Biology Knowledgebase. RegPrecise captures and visualize regulogs, sets of genes controlled by orthologous regulators in several closely related bacterial genomes, that were reconstructed by comparative genomics. The current release of RegPrecise 2.0 includes >1400 regulogs controlled either by protein transcription factors or by conserved ribonucleic acid regulatory motifs in >250 genomes from 24 taxonomic groups of bacteria. The reference regulons accumulated in RegPrecise can serve as a basis for automatic annotation of regulatory interactions in newly sequenced genomes. The developed API provides an efficient access to the RegPrecise data by a comprehensive set of 14 web service resources. The RegPrecise web services API is freely accessible at http://regprecise.lbl.gov/RegPrecise/services.jsp with no login requirements.
Collapse
|
18
|
Hormones. Nephrol Dial Transplant 2012. [DOI: 10.1093/ndt/gfs233] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
19
|
Abstract
The Department of Energy (DOE) Joint Genome Institute (JGI) is a national user facility with massive-scale DNA sequencing and analysis capabilities dedicated to advancing genomics for bioenergy and environmental applications. Beyond generating tens of trillions of DNA bases annually, the Institute develops and maintains data management systems and specialized analytical capabilities to manage and interpret complex genomic data sets, and to enable an expanding community of users around the world to analyze these data in different contexts over the web. The JGI Genome Portal (http://genome.jgi.doe.gov) provides a unified access point to all JGI genomic databases and analytical tools. A user can find all DOE JGI sequencing projects and their status, search for and download assemblies and annotations of sequenced genomes, and interactively explore those genomes and compare them with other sequenced microbes, fungi, plants or metagenomes using specialized systems tailored to each particular class of organisms. We describe here the general organization of the Genome Portal and the most recent addition, MycoCosm (http://jgi.doe.gov/fungi), a new integrated fungal genomics resource.
Collapse
|
20
|
VISTA Region Viewer (RViewer)--a computational system for prioritizing genomic intervals for biomedical studies. Bioinformatics 2011; 27:2595-7. [PMID: 21791533 DOI: 10.1093/bioinformatics/btr440] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
SUMMARY Current genome browsers are designed for linear browsing of individual genomic regions, but the high-throughput nature of experiments aiming to elucidate the genetic component of human disease makes it very important to develop user-friendly tools for comparing several genomic regions in parallel and prioritizing them based on their functional content. We introduce VISTA Region Viewer (RViewer), an interactive online tool that allows for efficient screening and prioritization of regions of the human genome for follow-up studies. The tool takes as input genetic variation data from different biomedical studies, determines a number of various functional parameters for both coding and non-coding sequences in each region and allows for sorting and searching the results of the analysis in multiple ways. AVAILABILITY AND IMPLEMENTATION The tool is implemented as a web application and is freely accessible on the Web at http://rviewer.lbl.gov CONTACT rviewer@lbl.gov; ildubchak@lbl.gov.
Collapse
|
21
|
Comparative genomics of the dormancy regulons in mycobacteria. J Bacteriol 2011; 193:3446-52. [PMID: 21602344 PMCID: PMC3133309 DOI: 10.1128/jb.00179-11] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2011] [Accepted: 05/06/2011] [Indexed: 11/20/2022] Open
Abstract
In response to stresses, Mycobacterium cells become dormant. This process is regulated by the DosR transcription factor. In Mycobacterium tuberculosis, the dormancy regulon is well characterized and contains the dosR gene itself and dosS and dosT genes encoding DosR kinases, nitroreductases (acg; Rv3131), diacylglycerol acyltransferase (DGAT) (Rv3130c), and many universal stress proteins (USPs). In this study, we apply comparative genomic analysis to characterize the DosR regulons in nine Mycobacterium genomes, Rhodococcus sp. RHA1, Nocardia farcinica, and Saccharopolyspora erythraea. The regulons are highly labile, containing eight core gene groups (regulators, kinases, USPs, DGATs, nitroreductases, ferredoxins, heat shock proteins, and the orthologs of the predicted kinase [Rv2004c] from M. tuberculosis) and 10 additional genes with more restricted taxonomic distribution that are mostly involved in anaerobic respiration. The largest regulon is observed in M. marinum and the smallest in M. abscessus. Analysis of large gene families encoding USPs, nitroreductases, and DGATs demonstrates a mosaic distribution of regulated and nonregulated members, suggesting frequent acquisition and loss of DosR-binding sites.
Collapse
|
22
|
Comparative genomic reconstruction of transcriptional networks controlling central metabolism in the Shewanella genus. BMC Genomics 2011; 12 Suppl 1:S3. [PMID: 21810205 PMCID: PMC3223726 DOI: 10.1186/1471-2164-12-s1-s3] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Background Genome-scale prediction of gene regulation and reconstruction of transcriptional regulatory networks in bacteria is one of the critical tasks of modern genomics. The Shewanella genus is comprised of metabolically versatile gamma-proteobacteria, whose lifestyles and natural environments are substantially different from Escherichia coli and other model bacterial species. The comparative genomics approaches and computational identification of regulatory sites are useful for the in silico reconstruction of transcriptional regulatory networks in bacteria. Results To explore conservation and variations in the Shewanella transcriptional networks we analyzed the repertoire of transcription factors and performed genomics-based reconstruction and comparative analysis of regulons in 16 Shewanella genomes. The inferred regulatory network includes 82 transcription factors and their DNA binding sites, 8 riboswitches and 6 translational attenuators. Forty five regulons were newly inferred from the genome context analysis, whereas others were propagated from previously characterized regulons in the Enterobacteria and Pseudomonas spp.. Multiple variations in regulatory strategies between the Shewanella spp. and E. coli include regulon contraction and expansion (as in the case of PdhR, HexR, FadR), numerous cases of recruiting non-orthologous regulators to control equivalent pathways (e.g. PsrA for fatty acid degradation) and, conversely, orthologous regulators to control distinct pathways (e.g. TyrR, ArgR, Crp). Conclusions We tentatively defined the first reference collection of ~100 transcriptional regulons in 16 Shewanella genomes. The resulting regulatory network contains ~600 regulated genes per genome that are mostly involved in metabolism of carbohydrates, amino acids, fatty acids, vitamins, metals, and stress responses. Several reconstructed regulons including NagR for N-acetylglucosamine catabolism were experimentally validated in S. oneidensis MR-1. Analysis of correlations in gene expression patterns helps to interpret the reconstructed regulatory network. The inferred regulatory interactions will provide an additional regulatory constrains for an integrated model of metabolism and regulation in S. oneidensis MR-1.
Collapse
|
23
|
Abstract
It was proposed that if some mRNA characteristics resulted in a low efficiency of termination signal, an additional closely located stop codon (tandem stop codons) could be used to prevent the harmful readthrough. However, the role of tandem terminators in higher eukaryotes was not verified and remains hypothetical. In this work the sequence features of Arabidopsis thaliana and Oryza sativa mRNAs were analyzed. It was found that plant mRNAs with UGA terminator were characterized by a higher frequency of nonsense codons in the first triplet position of 3'-UTR that could result from a weak natural selection for "reserve" stop signal. Interestingly, the presence of tandem stop codons positively correlated with a specific amino acid composition in the C-terminal position of the encoded proteins. In particular, C-terminal glycine positively correlated with significantly higher frequencies of reserve terminators at the beginning positions of 3'-UTR in UGA-containing mRNAs. This finding coincides with some earlier observations concerning the role of glycine and its codons in inefficient termination of translation and recoding (e.g., 2A oligopeptide).
Collapse
|
24
|
RegPredict: an integrated system for regulon inference in prokaryotes by comparative genomics approach. Nucleic Acids Res 2010; 38:W299-307. [PMID: 20542910 PMCID: PMC2896116 DOI: 10.1093/nar/gkq531] [Citation(s) in RCA: 113] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
RegPredict web server is designed to provide comparative genomics tools for reconstruction and analysis of microbial regulons using comparative genomics approach. The server allows the user to rapidly generate reference sets of regulons and regulatory motif profiles in a group of prokaryotic genomes. The new concept of a cluster of co-regulated orthologous operons allows the user to distribute the analysis of large regulons and to perform the comparative analysis of multiple clusters independently. Two major workflows currently implemented in RegPredict are: (i) regulon reconstruction for a known regulatory motif and (ii) ab initio inference of a novel regulon using several scenarios for the generation of starting gene sets. RegPredict provides a comprehensive collection of manually curated positional weight matrices of regulatory motifs. It is based on genomic sequences, ortholog and operon predictions from the MicrobesOnline. An interactive web interface of RegPredict integrates and presents diverse genomic and functional information about the candidate regulon members from several web resources. RegPredict is freely accessible at http://regpredict.lbl.gov.
Collapse
|
25
|
Abstract
The western clawed frog Xenopus tropicalis is an important model for vertebrate development that combines experimental advantages of the African clawed frog Xenopus laevis with more tractable genetics. Here we present a draft genome sequence assembly of X. tropicalis. This genome encodes more than 20,000 protein-coding genes, including orthologs of at least 1700 human disease genes. Over 1 million expressed sequence tags validated the annotation. More than one-third of the genome consists of transposable elements, with unusually prevalent DNA transposons. Like that of other tetrapods, the genome of X. tropicalis contains gene deserts enriched for conserved noncoding elements. The genome exhibits substantial shared synteny with human and chicken over major parts of large chromosomes, broken by lineage-specific chromosome fusions and fissions, mainly in the mammalian lineage.
Collapse
|
26
|
A cis-regulatory signature in ascidians and flies, independent of transcription factor binding sites. Curr Biol 2010; 20:792-802. [PMID: 20434338 DOI: 10.1016/j.cub.2010.03.063] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2010] [Revised: 03/12/2010] [Accepted: 03/23/2010] [Indexed: 12/22/2022]
Abstract
BACKGROUND Transcription initiation is controlled by cis-regulatory modules. Although these modules are usually made of clusters of short transcription factor binding sites, a small minority of such clusters in the genome have cis-regulatory activity. This paradox is currently unsolved. RESULTS To identify what discriminates active from inactive clusters, we focused our attention on short topologically unconstrained clusters of two ETS and two GATA binding sites, similar to the early neural enhancer of Ciona intestinalis Otx. We first computationally identified 55 such clusters, conserved between the two Ciona genomes. In vivo assay of the activity of 19 hits identified three novel early neural enhancers, all located next to genes coexpressed with Otx. Optimization of ETS and GATA binding sites was not always sufficient to confer activity to inactive clusters. Rather, a dinucleotide sequence code associated to nucleosome depletion showed a robust correlation with enhancer potential. Identification of a large collection of Ciona regulatory regions revealed that predicted nucleosome depletion constitutes a general signature of Ciona enhancers, which is conserved between orthologous loci in the two Ciona genomes and which partitions conserved noncoding sequences into a major nucleosome-bound fraction and a minor nucleosome-free fraction with higher cis-regulatory potential. We also found this signature in a large fraction of short Drosophila cis-regulatory modules. CONCLUSION This study indicates that a sequence-based dinucleotide signature, previously associated with nucleosome depletion and independent of transcription factor binding sites, contributes to the definition of a local cis-regulatory potential in two metazoa, Ciona intestinalis and Drosophila melanogaster.
Collapse
|
27
|
RegPrecise: a database of curated genomic inferences of transcriptional regulatory interactions in prokaryotes. Nucleic Acids Res 2009; 38:D111-8. [PMID: 19884135 PMCID: PMC2808921 DOI: 10.1093/nar/gkp894] [Citation(s) in RCA: 147] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
The RegPrecise database (http://regprecise.lbl.gov) was developed for capturing, visualization and analysis of predicted transcription factor regulons in prokaryotes that were reconstructed and manually curated by utilizing the comparative genomic approach. A significant number of high-quality inferences of transcriptional regulatory interactions have been already accumulated for diverse taxonomic groups of bacteria. The reconstructed regulons include transcription factors, their cognate DNA motifs and regulated genes/operons linked to the candidate transcription factor binding sites. The RegPrecise allows for browsing the regulon collections for: (i) conservation of DNA binding sites and regulated genes for a particular regulon across diverse taxonomic lineages; (ii) sets of regulons for a family of transcription factors; (iii) repertoire of regulons in a particular taxonomic group of species; (iv) regulons associated with a metabolic pathway or a biological process in various genomes. The initial release of the database includes ∼11 500 candidate binding sites for ∼400 orthologous groups of transcription factors from over 350 prokaryotic genomes. Majority of these data are represented by genome-wide regulon reconstructions in Shewanella and Streptococcus genera and a large-scale prediction of regulons for the LacI family of transcription factors. Another section in the database represents the results of accurate regulon propagation to the closely related genomes.
Collapse
|
28
|
Obtaining comparative genomic data with the VISTA family of computational tools. ACTA ACUST UNITED AC 2009; Chapter 10:10.6.1-10.6.17. [PMID: 19496056 DOI: 10.1002/0471250953.bi1006s26] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Comparison of DNA sequences from different species is a fundamental method for identifying functional elements, such as exons or enhancers, as they tend to exhibit significant sequence similarity due to purifying selection. Availability of whole-genome sequences for a constantly growing number of organisms makes identification of such elements within these genomes possible. There are two distinct phases in comparisons of genomic sequences: in the first, the sequences are aligned, and in the second, the resulting alignments are analyzed to find conservation signals that may be indicative of functional regions. Due to the considerable length of alignments, good visual representation techniques are a necessity for effective isolation of regions of interest. The VISTA family of tools provides biomedical investigators with a unified framework for the alignment of long genomic sequences and whole-genome assemblies, interactive visual analysis of alignments along with functional annotation, and many other comparative genomics capabilities.
Collapse
|
29
|
Green evolution and dynamic adaptations revealed by genomes of the marine picoeukaryotes Micromonas. Science 2009; 324:268-72. [PMID: 19359590 DOI: 10.1126/science.1167222] [Citation(s) in RCA: 416] [Impact Index Per Article: 27.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Picoeukaryotes are a taxonomically diverse group of organisms less than 2 micrometers in diameter. Photosynthetic marine picoeukaryotes in the genus Micromonas thrive in ecosystems ranging from tropical to polar and could serve as sentinel organisms for biogeochemical fluxes of modern oceans during climate change. These broadly distributed primary producers belong to an anciently diverged sister clade to land plants. Although Micromonas isolates have high 18S ribosomal RNA gene identity, we found that genomes from two isolates shared only 90% of their predicted genes. Their independent evolutionary paths were emphasized by distinct riboswitch arrangements as well as the discovery of intronic repeat elements in one isolate, and in metagenomic data, but not in other genomes. Divergence appears to have been facilitated by selection and acquisition processes that actively shape the repertoire of genes that are mutually exclusive between the two isolates differently than the core genes. Analyses of the Micromonas genomes offer valuable insights into ecological differentiation and the dynamic nature of early plant evolution.
Collapse
|
30
|
Abstract
Sorghum, an African grass related to sugar cane and maize, is grown for food, feed, fibre and fuel. We present an initial analysis of the approximately 730-megabase Sorghum bicolor (L.) Moench genome, placing approximately 98% of genes in their chromosomal context using whole-genome shotgun sequence validated by genetic, physical and syntenic information. Genetic recombination is largely confined to about one-third of the sorghum genome with gene order and density similar to those of rice. Retrotransposon accumulation in recombinationally recalcitrant heterochromatin explains the approximately 75% larger genome size of sorghum compared with rice. Although gene and repetitive DNA distributions have been preserved since palaeopolyploidization approximately 70 million years ago, most duplicated gene sets lost one member before the sorghum-rice divergence. Concerted evolution makes one duplicated chromosomal segment appear to be only a few million years old. About 24% of genes are grass-specific and 7% are sorghum-specific. Recent gene and microRNA duplications may contribute to sorghum's drought tolerance.
Collapse
|
31
|
Abstract
Multiple sequence alignments have become one of the most commonly used resources in genomics research. Most algorithms for multiple alignment of whole genomes rely either on a reference genome, against which all of the other sequences are laid out, or require a one-to-one mapping between the nucleotides of the genomes, preventing the alignment of recently duplicated regions. Both approaches have drawbacks for whole-genome comparisons. In this paper we present a novel symmetric alignment algorithm. The resulting alignments not only represent all of the genomes equally well, but also include all relevant duplications that occurred since the divergence from the last common ancestor. Our algorithm, implemented as a part of the VISTA Genome Pipeline (VGP), was used to align seven vertebrate and six Drosophila genomes. The resulting whole-genome alignments demonstrate a higher sensitivity and specificity than the pairwise alignments previously available through the VGP and have higher exon alignment accuracy than comparable public whole-genome alignments. Of the multiple alignment methods tested, ours performed the best at aligning genes from multigene families-perhaps the most challenging test for whole-genome alignments. Our whole-genome multiple alignments are available through the VISTA Browser at http://genome.lbl.gov/vista/index.shtml.
Collapse
|
32
|
Combinatorial regulation of endothelial gene expression by ets and forkhead transcription factors. Cell 2008; 135:1053-64. [PMID: 19070576 DOI: 10.1016/j.cell.2008.10.049] [Citation(s) in RCA: 257] [Impact Index Per Article: 16.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2008] [Revised: 08/20/2008] [Accepted: 10/20/2008] [Indexed: 11/30/2022]
Abstract
Vascular development begins when mesodermal cells differentiate into endothelial cells, which then form primitive vessels. It has been hypothesized that endothelial-specific gene expression may be regulated combinatorially, but the transcriptional mechanisms governing specificity in vascular gene expression remain incompletely understood. Here, we identify a 44 bp transcriptional enhancer that is sufficient to direct expression specifically and exclusively to the developing vascular endothelium. This enhancer is regulated by a composite cis-acting element, the FOX:ETS motif, which is bound and synergistically activated by Forkhead and Ets transcription factors. We demonstrate that coexpression of the Forkhead protein FoxC2 and the Ets protein Etv2 induces ectopic expression of vascular genes in Xenopus embryos, and that combinatorial knockdown of the orthologous genes in zebrafish embryos disrupts vascular development. Finally, we show that FOX:ETS motifs are present in many known endothelial-specific enhancers and that this motif is an efficient predictor of endothelial enhancers in the human genome.
Collapse
|
33
|
ATGC: a database of orthologous genes from closely related prokaryotic genomes and a research platform for microevolution of prokaryotes. Nucleic Acids Res 2008; 37:D448-54. [PMID: 18845571 PMCID: PMC2686458 DOI: 10.1093/nar/gkn684] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Abstract
The database of Alignable Tight Genomic Clusters (ATGCs) consists of closely related genomes of archaea and bacteria, and is a resource for research into prokaryotic microevolution. Construction of a data set with appropriate characteristics is a major hurdle for this type of studies. With the current rate of genome sequencing, it is difficult to follow the progress of the field and to determine which of the available genome sets meet the requirements of a given research project, in particular, with respect to the minimum and maximum levels of similarity between the included genomes. Additionally, extraction of specific content, such as genomic alignments or families of orthologs, from a selected set of genomes is a complicated and time-consuming process. The database addresses these problems by providing an intuitive and efficient web interface to browse precomputed ATGCs, select appropriate ones and access ATGC-derived data such as multiple alignments of orthologous proteins, matrices of pairwise intergenomic distances based on genome-wide analysis of synonymous and nonsynonymous substitution rates and others. The ATGC database will be regularly updated following new releases of the NCBI RefSeq. The database is hosted by the Genomics Division at Lawrence Berkeley National laboratory and is publicly available at http://atgc.lbl.gov
Collapse
|
34
|
IMG/M: a data management and analysis system for metagenomes. Nucleic Acids Res 2008; 36:D534-8. [PMID: 17932063 PMCID: PMC2238950 DOI: 10.1093/nar/gkm869] [Citation(s) in RCA: 233] [Impact Index Per Article: 14.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2007] [Revised: 09/22/2007] [Accepted: 09/24/2007] [Indexed: 11/13/2022] Open
Abstract
IMG/M is a data management and analysis system for microbial community genomes (metagenomes) hosted at the Department of Energy's (DOE) Joint Genome Institute (JGI). IMG/M consists of metagenome data integrated with isolate microbial genomes from the Integrated Microbial Genomes (IMG) system. IMG/M provides IMG's comparative data analysis tools extended to handle metagenome data, together with metagenome-specific analysis tools. IMG/M is available at http://img.jgi.doe.gov/m.
Collapse
|
35
|
Short sequence motifs, overrepresented in mammalian conserved non-coding sequences. BMC Genomics 2007; 8:378. [PMID: 17945028 PMCID: PMC2176071 DOI: 10.1186/1471-2164-8-378] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2007] [Accepted: 10/18/2007] [Indexed: 12/22/2022] Open
Abstract
Background A substantial fraction of non-coding DNA sequences of multicellular eukaryotes is under selective constraint. In particular, ~5% of the human genome consists of conserved non-coding sequences (CNSs). CNSs differ from other genomic sequences in their nucleotide composition and must play important functional roles, which mostly remain obscure. Results We investigated relative abundances of short sequence motifs in all human CNSs present in the human/mouse whole-genome alignments vs. three background sets of sequences: (i) weakly conserved or unconserved non-coding sequences (non-CNSs); (ii) near-promoter sequences (located between nucleotides -500 and -1500, relative to a start of transcription); and (iii) random sequences with the same nucleotide composition as that of CNSs. When compared to non-CNSs and near-promoter sequences, CNSs possess an excess of AT-rich motifs, often containing runs of identical nucleotides. In contrast, when compared to random sequences, CNSs contain an excess of GC-rich motifs which, however, lack CpG dinucleotides. Thus, abundance of short sequence motifs in human CNSs, taken as a whole, is mostly determined by their overall compositional properties and not by overrepresentation of any specific short motifs. These properties are: (i) high AT-content of CNSs, (ii) a tendency, probably due to context-dependent mutation, of A's and T's to clump, (iii) presence of short GC-rich regions, and (iv) avoidance of CpG contexts, due to their hypermutability. Only a small number of short motifs, overrepresented in all human CNSs are similar to binding sites of transcription factors from the FOX family. Conclusion Human CNSs as a whole appear to be too broad a class of sequences to possess strong footprints of any short sequence-specific functions. Such footprints should be studied at the level of functional subclasses of CNSs, such as those which flank genes with a particular pattern of expression. Overall properties of CNSs are affected by patterns in mutation, suggesting that selection which causes their conservation is not always very strong.
Collapse
|
36
|
Abstract
Chlamydomonas reinhardtii is a unicellular green alga whose lineage diverged from land plants over 1 billion years ago. It is a model system for studying chloroplast-based photosynthesis, as well as the structure, assembly, and function of eukaryotic flagella (cilia), which were inherited from the common ancestor of plants and animals, but lost in land plants. We sequenced the approximately 120-megabase nuclear genome of Chlamydomonas and performed comparative phylogenomic analyses, identifying genes encoding uncharacterized proteins that are likely associated with the function and biogenesis of chloroplasts or eukaryotic flagella. Analyses of the Chlamydomonas genome advance our understanding of the ancestral eukaryotic cell, reveal previously unknown genes associated with photosynthetic and flagellar functions, and establish links between ciliopathy and the composition and function of flagella.
Collapse
|
37
|
The integrated microbial genomes (IMG) system in 2007: data content and analysis tool extensions. Nucleic Acids Res 2007; 36:D528-33. [PMID: 17933782 PMCID: PMC2238897 DOI: 10.1093/nar/gkm846] [Citation(s) in RCA: 166] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
The integrated microbial genomes (IMG) system is a data management, analysis and annotation platform for all publicly available genomes. IMG contains both draft and complete JGI microbial genomes integrated with all other publicly available genomes from all three domains of life, together with a large number of plasmids and viruses. IMG provides tools and viewers for analyzing and annotating genomes, genes and functions, individually or in a comparative context. Since its first release in 2005, IMG's data content and analytical capabilities have been constantly expanded through quarterly releases. IMG is provided by the DOE-Joint Genome Institute (JGI) and is available from http://img.jgi.doe.gov.
Collapse
|
38
|
Abstract
Chlamydomonas reinhardtii is a unicellular green alga whose lineage diverged from land plants over 1 billion years ago. It is a model system for studying chloroplast-based photosynthesis, as well as the structure, assembly, and function of eukaryotic flagella (cilia), which were inherited from the common ancestor of plants and animals, but lost in land plants. We sequenced the approximately 120-megabase nuclear genome of Chlamydomonas and performed comparative phylogenomic analyses, identifying genes encoding uncharacterized proteins that are likely associated with the function and biogenesis of chloroplasts or eukaryotic flagella. Analyses of the Chlamydomonas genome advance our understanding of the ancestral eukaryotic cell, reveal previously unknown genes associated with photosynthetic and flagellar functions, and establish links between ciliopathy and the composition and function of flagella.
Collapse
|
39
|
Abstract
BACKGROUND Independently evolving lineages mostly accumulate different changes, which leads to their gradual divergence. However, parallel accumulation of identical changes is also common, especially in traits with only a small number of possible states. RESULTS We characterize parallelism in evolution of coding sequences in three four-species sets of genomes of mammals, Drosophila, and yeasts. Each such set contains two independent evolutionary paths, which we call paths I and II. An amino acid replacement which occurred along path I also occurs along path II with the probability 50-80% of that expected under selective neutrality. Thus, the per site rate of parallel evolution of proteins is several times higher than their average rate of evolution, but still lower than the rate of evolution of neutral sequences. This deficit may be caused by changes in the fitness landscape, leading to a replacement being possible along path I but not along path II. However, constant, weak selection assumed by the nearly neutral model of evolution appears to be a more likely explanation. Then, the average coefficient of selection associated with an amino acid replacement, in the units of the effective population size, must exceed approximately 0.4, and the fraction of effectively neutral replacements must be below approximately 30%. At a majority of evolvable amino acid sites, only a relatively small number of different amino acids is permitted. CONCLUSION High, but below-neutral, rates of parallel amino acid replacements suggest that a majority of amino acid replacements that occur in evolution are subject to weak, but non-trivial, selection, as predicted by Ohta's nearly-neutral theory.
Collapse
|
40
|
A correlation with exon expression approach to identify cis-regulatory elements for tissue-specific alternative splicing. Nucleic Acids Res 2007; 35:4845-57. [PMID: 17626050 PMCID: PMC1950531 DOI: 10.1093/nar/gkm485] [Citation(s) in RCA: 70] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2007] [Revised: 06/04/2007] [Accepted: 06/05/2007] [Indexed: 12/22/2022] Open
Abstract
Correlation of motif occurrences with gene expression intensity is an effective strategy for elucidating transcriptional cis-regulatory logic. Here we demonstrate that this approach can also identify cis-regulatory elements for alternative pre-mRNA splicing. Using data from a human exon microarray, we identified 56 cassette exons that exhibited higher transcript-normalized expression in muscle than in other normal adult tissues. Intron sequences flanking these exons were then analyzed to identify candidate regulatory motifs for muscle-specific alternative splicing. Correlation of motif parameters with gene-normalized exon expression levels was examined using linear regression and linear splines on RNA words and degenerate weight matrices, respectively. Our unbiased analysis uncovered multiple candidate regulatory motifs for muscle-specific splicing, many of which are phylogenetically conserved among vertebrate genomes. The most prominent downstream motifs were binding sites for Fox1- and CELF-related splicing factors, and a branchpoint-like element acuaac; pyrimidine-rich elements resembling PTB-binding sites were most significant in upstream introns. Intriguingly, our systematic study indicates a paucity of novel muscle-specific elements that are dominant in short proximal intronic regions. We propose that Fox and CELF proteins play major roles in enforcing the muscle-specific alternative splicing program, facilitating expression of unique isoforms of cytoskeletal proteins critical to muscle cell function.
Collapse
|
41
|
Abstract
The VISTA portal for comparative genomics is designed to give biomedical scientists a unified set of tools to lead them from the raw DNA sequences through the alignment and annotation to the visualization of the results. The VISTA portal also hosts the alignments of a number of genomes computed by our group, allowing users to study the regions of their interest without having to manually download the individual sequences. Here we describe various algorithmic and functional improvements implemented in the VISTA portal over the last 2 years. The VISTA Portal is accessible at http://genome.lbl.gov/vista.
Collapse
|
42
|
The tiny eukaryote Ostreococcus provides genomic insights into the paradox of plankton speciation. Proc Natl Acad Sci U S A 2007; 104:7705-10. [PMID: 17460045 PMCID: PMC1863510 DOI: 10.1073/pnas.0611046104] [Citation(s) in RCA: 417] [Impact Index Per Article: 24.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The smallest known eukaryotes, at approximately 1-mum diameter, are Ostreococcus tauri and related species of marine phytoplankton. The genome of Ostreococcus lucimarinus has been completed and compared with that of O. tauri. This comparison reveals surprising differences across orthologous chromosomes in the two species from highly syntenic chromosomes in most cases to chromosomes with almost no similarity. Species divergence in these phytoplankton is occurring through multiple mechanisms acting differently on different chromosomes and likely including acquisition of new genes through horizontal gene transfer. We speculate that this latter process may be involved in altering the cell-surface characteristics of each species. In addition, the genome of O. lucimarinus provides insights into the unique metal metabolism of these organisms, which are predicted to have a large number of selenocysteine-containing proteins. Selenoenzymes are more catalytically active than similar enzymes lacking selenium, and thus the cell may require less of that protein. As reported here, selenoenzymes, novel fusion proteins, and loss of some major protein families including ones associated with chromatin are likely important adaptations for achieving a small cell size.
Collapse
|
43
|
TreeQ-VISTA: an interactive tree visualization tool with functional annotation query capabilities. Bioinformatics 2007; 23:764-6. [PMID: 17234642 DOI: 10.1093/bioinformatics/btl643] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
UNLABELLED We describe a general multiplatform exploratory tool called TreeQ-Vista, designed for presenting functional annotations in a phylogenetic context. Traits, such as phenotypic and genomic properties, are interactively queried from a user-provided relational database with a user-friendly interface which provides a set of tools for users with or without SQL knowledge. The query results are projected onto a phylogenetic tree and can be displayed in multiple color groups. A rich set of browsing, grouping and query tools are provided to facilitate trait exploration, comparison and analysis. AVAILABILITY The program, detailed tutorial and examples are available online (http:/genome.lbl.gov/vista/TreeQVista).
Collapse
|
44
|
Abstract
Despite the known existence of distant-acting cis-regulatory elements in the human genome, only a small fraction of these elements has been identified and experimentally characterized in vivo. This paucity of enhancer collections with defined activities has thus hindered computational approaches for the genome-wide prediction of enhancers and their functions. To fill this void, we utilize comparative genome analysis to identify candidate enhancer elements in the human genome coupled with the experimental determination of their in vivo enhancer activity in transgenic mice [L. A. Pennacchio et al. (2006) Nature, in press]. These data are available through the VISTA Enhancer Browser (). This growing database currently contains over 250 experimentally tested DNA fragments, of which more than 100 have been validated as tissue-specific enhancers. For each positive enhancer, we provide digital images of whole-mount embryo staining at embryonic day 11.5 and an anatomical description of the reporter gene expression pattern. Users can retrieve elements near single genes of interest, search for enhancers that target reporter gene expression to a particular tissue, or download entire collections of enhancers with a defined tissue specificity or conservation depth. These experimentally validated training sets are expected to provide a basis for a wide range of downstream computational and functional studies of enhancer function.
Collapse
|
45
|
RegTransBase--a database of regulatory sequences and interactions in a wide range of prokaryotic genomes. Nucleic Acids Res 2006; 35:D407-12. [PMID: 17142223 PMCID: PMC1669780 DOI: 10.1093/nar/gkl865] [Citation(s) in RCA: 76] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
RegTransBase is a manually curated database of regulatory interactions in prokaryotes that captures the knowledge in public scientific literature using a controlled vocabulary. Although several databases describing interactions between regulatory proteins and their binding sites are already being maintained, they either focus mostly on the model organisms Escherichia coli and Bacillus subtilis or are entirely computationally derived. RegTransBase describes a large number of regulatory interactions reported in many organisms and contains the following types of experimental data: the activation or repression of transcription by an identified direct regulator, determining the transcriptional regulatory function of a protein (or RNA) directly binding to DNA (RNA), mapping or prediction of a binding site for a regulatory protein and characterization of regulatory mutations. Currently, RegTransBase content is derived from about 3000 relevant articles describing over 7000 experiments in relation to 128 microbes. It contains data on the regulation of about 7500 genes and evidence for 6500 interactions with 650 regulators. RegTransBase also contains manually created position weight matrices (PWM) that can be used to identify candidate regulatory sites in over 60 species. RegTransBase is available at .
Collapse
|
46
|
In vivo enhancer analysis of human conserved non-coding sequences. Nature 2006; 444:499-502. [PMID: 17086198 DOI: 10.1038/nature05295] [Citation(s) in RCA: 867] [Impact Index Per Article: 48.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2006] [Accepted: 09/22/2006] [Indexed: 12/16/2022]
Abstract
Identifying the sequences that direct the spatial and temporal expression of genes and defining their function in vivo remains a significant challenge in the annotation of vertebrate genomes. One major obstacle is the lack of experimentally validated training sets. In this study, we made use of extreme evolutionary sequence conservation as a filter to identify putative gene regulatory elements, and characterized the in vivo enhancer activity of a large group of non-coding elements in the human genome that are conserved in human-pufferfish, Takifugu (Fugu) rubripes, or ultraconserved in human-mouse-rat. We tested 167 of these extremely conserved sequences in a transgenic mouse enhancer assay. Here we report that 45% of these sequences functioned reproducibly as tissue-specific enhancers of gene expression at embryonic day 11.5. While directing expression in a broad range of anatomical structures in the embryo, the majority of the 75 enhancers directed expression to various regions of the developing nervous system. We identified sequence signatures enriched in a subset of these elements that targeted forebrain expression, and used these features to rank all approximately 3,100 non-coding elements in the human genome that are conserved between human and Fugu. The testing of the top predictions in transgenic mice resulted in a threefold enrichment for sequences with forebrain enhancer activity. These data dramatically expand the catalogue of human gene enhancers that have been characterized in vivo, and illustrate the utility of such training sets for a variety of biological applications, including decoding the regulatory vocabulary of the human genome.
Collapse
|
47
|
Abstract
We report the draft genome of the black cottonwood tree, Populus trichocarpa. Integration of shotgun sequence assembly with genetic mapping enabled chromosome-scale reconstruction of the genome. More than 45,000 putative protein-coding genes were identified. Analysis of the assembled genome revealed a whole-genome duplication event; about 8000 pairs of duplicated genes from that event survived in the Populus genome. A second, older duplication event is indistinguishably coincident with the divergence of the Populus and Arabidopsis lineages. Nucleotide substitution, tandem gene duplication, and gross chromosomal rearrangement appear to proceed substantially more slowly in Populus than in Arabidopsis. Populus has more protein-coding genes than Arabidopsis, ranging on average from 1.4 to 1.6 putative Populus homologs for each Arabidopsis gene. However, the relative frequency of protein domains in the two genomes is similar. Overrepresented exceptions in Populus include genes associated with lignocellulosic wall biosynthesis, meristem development, disease resistance, and metabolite transport.
Collapse
|
48
|
VISTA family of computational tools for comparative analysis of DNA sequences and whole genomes. Methods Mol Biol 2006; 338:69-89. [PMID: 16888351 DOI: 10.1385/1-59745-097-9:69] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
Comparative analysis of DNA sequences is becoming one of the major methods for discovery of functionally important genomic intervals. Presented here the VISTA family of computational tools was built to help researchers in this undertaking. These tools allow the researcher to align DNA sequences, quickly visualize conservation levels between them, identify highly conserved regions, and analyze sequences of interest through one of the following approaches: . Browse precomputed whole-genome alignments of vertebrates and other groups of organisms. . Submit sequences to Genome VISTA to align them to whole genomes. . Submit two or more sequences to mVISTA to align them with each other (a variety of alignment programs with several distinct capabilities are made available).. Submit sequences to Regulatory VISTA (rVISTA) to perform transcription factor binding site predictions based on conservation within sequence alignments.Use stand-alone alignment and visualization programs to run comparative sequence analysis locally All VISTA tools use standard algorithms for visualization and conservation analysis to make comparison of results from different programs more straightforward. The web page http://genome.lbl.gov/vista/ serves as a portal for access to all VISTA tools. Our support group can be reached by email at vista@lbl.gov.
Collapse
|
49
|
Abstract
The application of shotgun sequencing to environmental samples has revealed a new universe of microbial community genomes (metagenomes) involving previously uncultured organisms. Metagenome analysis, which is expected to provide a comprehensive picture of the gene functions and metabolic capacity for microbial communities, needs to be conducted in the context of a comprehensive data management and analysis system. We present in this paper IMG/M, an experimental metagenome data management and analysis system that is based on the Integrated Microbial Genomes (IMG) system. IMG/M provides tools and viewers for analyzing both metagenomes and isolate genomes individually or in a comparative context. IMG/M is available at http://img.jgi.doe.gov/m.
Collapse
|
50
|
Conservation patterns in different functional sequence categories of divergent Drosophila species. Genomics 2006; 88:431-42. [PMID: 16697139 DOI: 10.1016/j.ygeno.2006.03.012] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2006] [Revised: 03/16/2006] [Accepted: 03/21/2006] [Indexed: 01/12/2023]
Abstract
We have explored the distributions of fully conserved ungapped blocks in genome-wide pair-wise alignments of recently completed species of Drosophila: D. melanogaster, D. yakuba, D. ananassae, D. pseudoobscura, D. virilis, and D. mojavensis. Based on these distributions we have found that nearly every functional sequence category possesses its own distinctive conservation pattern, sometimes independent of the overall sequence conservation level. In the coding and regulatory regions, the ungapped blocks were longer than in introns, UTRs, and nonfunctional sequences. At the same time, the blocks in the coding regions carried a 3N + 2 signature characteristic of synonymous substitutions in the third-codon position. Larger block sizes in transcription regulatory regions can be explained by the presence of conserved arrays of binding sites for transcription factors. We also have shown that the longest ungapped blocks, or "ultraconserved" sequences, are associated with specific gene groups, including those encoding ion channels and components of the cytoskeleton. We discuss how restraining conservation patterns may help in mapping functional sequence categories and improve genome annotation.
Collapse
|