Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

Total Articles

52
(from Reference Citation Analysis)

Article PDFs (27)

Cited by > 0 (47)

Searched Name

Gos Micklem

Ranked By

Results Analysis

Year Published Analysis
Article Type Analysis
Publication Title Analysis
Category Analysis

Results Analysis

Indexed Articles

Year Published

Show more Refine

Article Statistics

Refine

Publication Titles

Show more Refine

Grant Agencies

Show more Refine

Category

Show more Refine

Number	Citation Analysis
1	Identification of multiple transcription factor genes potentially involved in the development of electrosensory versus mechanosensory lateral line organs. Front Cell Dev Biol 2024;12:1327924. [PMID: 38562141 PMCID: PMC10982350 DOI: 10.3389/fcell.2024.1327924] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Accepted: 02/19/2024] [Indexed: 04/04/2024] Open Abstract In electroreceptive jawed vertebrates, embryonic lateral line placodes give rise to electrosensory ampullary organs as well as mechanosensory neuromasts. Previous reports of shared gene expression suggest that conserved mechanisms underlie electroreceptor and mechanosensory hair cell development and that electroreceptors evolved as a transcriptionally related "sister cell type" to hair cells. We previously identified only one transcription factor gene, Neurod4, as ampullary organ-restricted in the developing lateral line system of a chondrostean ray-finned fish, the Mississippi paddlefish (Polyodon spathula). The other 16 transcription factor genes we previously validated in paddlefish were expressed in both ampullary organs and neuromasts. Here, we used our published lateral line organ-enriched gene-set (arising from differential bulk RNA-seq in late-larval paddlefish), together with a candidate gene approach, to identify 25 transcription factor genes expressed in the developing lateral line system of a more experimentally tractable chondrostean, the sterlet (Acipenser ruthenus, a small sturgeon), and/or that of paddlefish. Thirteen are expressed in both ampullary organs and neuromasts, consistent with conservation of molecular mechanisms. Seven are electrosensory-restricted on the head (Irx5, Irx3, Insm1, Sp5, Satb2, Mafa and Rorc), and five are the first-reported mechanosensory-restricted transcription factor genes (Foxg1, Sox8, Isl1, Hmx2 and Rorb). However, as previously reported, Sox8 is expressed in ampullary organs as well as neuromasts in a catshark (Scyliorhinus canicula), suggesting the existence of lineage-specific differences between cartilaginous and ray-finned fishes. Overall, our results support the hypothesis that ampullary organs and neuromasts develop via largely conserved transcriptional mechanisms, and identify multiple transcription factors potentially involved in the formation of electrosensory versus mechanosensory lateral line organs. Collapse Key Words ampullary organ electrosensory lateral line organs mechanosensory neuromast paddlefish sterlet sturgeon Collapse MESH Headings Collapse Grants BB/F00818X/1 Biotechnology and Biological Sciences Research Council Biotechnology and Biological Sciences Research Council Collapse
2	HumanMine: advanced data searching, analysis and cross-species comparison. Database (Oxford) 2022;2022:6640317. [PMID: 35820040 PMCID: PMC9275753 DOI: 10.1093/database/baac054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2022] [Revised: 06/07/2022] [Accepted: 06/21/2022] [Indexed: 11/29/2022] Abstract HumanMine (www.humanmine.org) is an integrated database of human genomics and proteomics data that provides a powerful interface to support sophisticated exploration and analysis of data compiled from experimental, computational and curated data sources. Built using the InterMine data integration platform, HumanMine includes genes, proteins, pathways, expression levels, Single nucleotide polymorphism (SNP), diseases and more, integrated into a single searchable database. HumanMine promotes integrative analysis, a powerful approach in modern biology that allows many sources of evidence to be analysed together. The data can be accessed through a user-friendly web interface as well as a powerful, scriptable web service Application programming interface (API) to allow programmatic access to data. The web interface includes a useful identifier resolution system, sophisticated query options and interactive results tables that enable powerful exploration of data, including data summaries, filtering, browsing and export. A set of graphical analysis tools provide a rich environment for data exploration including statistical enrichment of sets of genes or other biological entities. HumanMine can be used for integrative multistaged analysis that can lead to new insights and uncover previously unknown relationships. Database URL: https://www.humanmine.org Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
3	Insights into olfactory ensheathing cell development from a laser-microdissection and transcriptome-profiling approach. Glia 2020;68:2550-2584. [PMID: 32857879 PMCID: PMC7116175 DOI: 10.1002/glia.23870] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2020] [Revised: 05/23/2020] [Accepted: 05/27/2020] [Indexed: 12/14/2022] Abstract Olfactory ensheathing cells (OECs) are neural crest-derived glia that ensheath bundles of olfactory axons from their peripheral origins in the olfactory epithelium to their central targets in the olfactory bulb. We took an unbiased laser microdissection and differential RNA-seq approach, validated by in situ hybridization, to identify candidate molecular mechanisms underlying mouse OEC development and differences with the neural crest-derived Schwann cells developing on other peripheral nerves. We identified 25 novel markers for developing OECs in the olfactory mucosa and/or the olfactory nerve layer surrounding the olfactory bulb, of which 15 were OEC-specific (that is, not expressed by Schwann cells). One pan-OEC-specific gene, Ptprz1, encodes a receptor-like tyrosine phosphatase that blocks oligodendrocyte differentiation. Mutant analysis suggests Ptprz1 may also act as a brake on OEC differentiation, and that its loss disrupts olfactory axon targeting. Overall, our results provide new insights into OEC development and the diversification of neural crest-derived glia. Collapse Key Words OECs Ptprz1 Wnt pathway boundary cap cells neural crest oligodendrocytes trigeminal Schwann cells Collapse MESH Headings Collapse Grants Collapse
4	InterMineR: an R package for InterMine databases. Bioinformatics 2019;35:3206-3207. [PMID: 30668641 PMCID: PMC6736411 DOI: 10.1093/bioinformatics/btz039] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2018] [Revised: 12/17/2018] [Accepted: 01/17/2019] [Indexed: 11/16/2022] Open Abstract SUMMARY InterMineR is a package designed to provide a flexible interface between the R programming environment and biological databases built using the InterMine platform. The package offers access to the flexible query builder and the library of term enrichment tools of the InterMine framework, as well as interoperability with other Bioconductor packages. This facilitates automation of data retrieval tasks as well as downstream analysis with existing statistical tools in the R environment. AVAILABILITY AND IMPLEMENTATION InterMineR is free and open source, released under the LGPL licence and available from the Bioconductor project and Github (https://bioconductor.org/packages/release/bioc/html/InterMineR.html, https://github.com/intermine/interMineR). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online. Collapse Key Words Collapse MESH Headings Databases, Factual Information Storage and Retrieval Software Collapse Grants Wellcome Trust 099133 Wellcome Trust Collapse
5	The InterMine Android app: Cross-organism genomic data in your pocket. F1000Res 2018;7:1837. [PMID: 31240100 PMCID: PMC6572867 DOI: 10.12688/f1000research.17005.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 11/12/2018] [Indexed: 10/13/2023] Open Abstract InterMine is a data integration and analysis software system that has been used to create both inter-connected and stand-alone biological databases for the analysis of large and complex biological data sets. Together, the InterMine databases provide access to extensive data across multiple organisms. To provide more convenient access to these data from Android mobile devices, we have developed the InterMine app, an application that can be run on any Android mobile phone or tablet. The InterMine app provides a single interface for data access, search and exploration of the InterMine databases. It can be used to retrieve information on genes and gene lists, and their relatives across species. Simple searches can be used to access a range of data about a specific gene, while links to the InterMine databases provide access to more detailed report pages and gene list analysis tools. The InterMine app thus facilitates rapid exploration of genes across multiple organisms and kinds of data. Collapse Key Words Android app Gene search Genomics data InterMine Collapse MESH Headings Cell Phone Databases, Factual Genomics Software Collapse Grants Collapse
6	The InterMine Android app: Cross-organism genomic data in your pocket. F1000Res 2018;7:1837. [PMID: 31240100 PMCID: PMC6572867 DOI: 10.12688/f1000research.17005.2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 05/23/2019] [Indexed: 12/03/2022] Open Abstract InterMine is a data integration and analysis software system that has been used to create both inter-connected and stand-alone biological databases for the analysis of large and complex biological data sets. Together, the InterMine databases provide access to extensive data across multiple organisms. To provide more convenient access to these data from Android mobile devices, we have developed the InterMine app, an application that can be run on any Android mobile phone or tablet. The InterMine app provides a single interface for data access, search and exploration of the InterMine databases. It can be used to retrieve information on genes and gene lists, and their relatives across species. Simple searches can be used to access a range of data about a specific gene, while links to the InterMine databases provide access to more detailed report pages and gene list analysis tools. The InterMine app thus facilitates rapid exploration of genes across multiple organisms and kinds of data. Collapse Key Words Android app Gene search Genomics data InterMine Collapse MESH Headings Cell Phone Databases, Factual Genomics Software Collapse Grants Wellcome Trust Collapse
7	Encompassing new use cases - level 3.0 of the HUPO-PSI format for molecular interactions. BMC Bioinformatics 2018;19:134. [PMID: 29642841 PMCID: PMC5896046 DOI: 10.1186/s12859-018-2118-1] [Citation(s) in RCA: 38] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2017] [Accepted: 03/20/2018] [Indexed: 01/02/2023] Open Abstract BACKGROUND Systems biologists study interaction data to understand the behaviour of whole cell systems, and their environment, at a molecular level. In order to effectively achieve this goal, it is critical that researchers have high quality interaction datasets available to them, in a standard data format, and also a suite of tools with which to analyse such data and form experimentally testable hypotheses from them. The PSI-MI XML standard interchange format was initially published in 2004, and expanded in 2007 to enable the download and interchange of molecular interaction data. PSI-XML2.5 was designed to describe experimental data and to date has fulfilled this basic requirement. However, new use cases have arisen that the format cannot properly accommodate. These include data abstracted from more than one publication such as allosteric/cooperative interactions and protein complexes, dynamic interactions and the need to link kinetic and affinity data to specific mutational changes. RESULTS The Molecular Interaction workgroup of the HUPO-PSI has extended the existing, well-used XML interchange format for molecular interaction data to meet new use cases and enable the capture of new data types, following extensive community consultation. PSI-MI XML3.0 expands the capabilities of the format beyond simple experimental data, with a concomitant update of the tool suite which serves this format. The format has been implemented by key data producers such as the International Molecular Exchange (IMEx) Consortium of protein interaction databases and the Complex Portal. CONCLUSIONS PSI-MI XML3.0 has been developed by the data producers, data users, tool developers and database providers who constitute the PSI-MI workgroup. This group now actively supports PSI-MI XML2.5 as the main interchange format for experimental data, PSI-MI XML3.0 which additionally handles more complex data types, and the simpler, tab-delimited MITAB2.5, 2.6 and 2.7 for rapid parsing and download. Collapse Key Words Data standards HUPO-PSI Molecular interactions PSI-MI Protein complexes Protein-protein interaction XML Collapse MESH Headings Databases, Protein Humans Mutation/genetics Protein Interaction Maps Proteome/metabolism Proteomics Systems Biology Collapse Grants R01 GM123126 NIGMS NIH HHS BB/L024179/1 Biotechnology and Biological Sciences Research Council Collapse
8	Comparative genomics of bdelloid rotifers: Insights from desiccating and nondesiccating species. PLoS Biol 2018;16:e2004830. [PMID: 29689044 PMCID: PMC5916493 DOI: 10.1371/journal.pbio.2004830] [Citation(s) in RCA: 43] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2017] [Accepted: 03/19/2018] [Indexed: 12/22/2022] Open Abstract Bdelloid rotifers are a class of microscopic invertebrates that have existed for millions of years apparently without sex or meiosis. They inhabit a variety of temporary and permanent freshwater habitats globally, and many species are remarkably tolerant of desiccation. Bdelloids offer an opportunity to better understand the evolution of sex and recombination, but previous work has emphasised desiccation as the cause of several unusual genomic features in this group. Here, we present high-quality whole-genome sequences of 3 bdelloid species: Rotaria macrura and R. magnacalcarata, which are both desiccation intolerant, and Adineta ricciae, which is desiccation tolerant. In combination with the published assembly of A. vaga, which is also desiccation tolerant, we apply a comparative genomics approach to evaluate the potential effects of desiccation tolerance and asexuality on genome evolution in bdelloids. We find that ancestral tetraploidy is conserved among all 4 bdelloid species, but homologous divergence in obligately aquatic Rotaria genomes is unexpectedly low. This finding is contrary to current models regarding the role of desiccation in shaping bdelloid genomes. In addition, we find that homologous regions in A. ricciae are largely collinear and do not form palindromic repeats as observed in the published A. vaga assembly. Consequently, several features interpreted as genomic evidence for long-term ameiotic evolution are not general to all bdelloid species, even within the same genus. Finally, we substantiate previous findings of high levels of horizontally transferred nonmetazoan genes in both desiccating and nondesiccating bdelloid species and show that this unusual feature is not shared by other animal phyla, even those with desiccation-tolerant representatives. These comparisons call into question the proposed role of desiccation in mediating horizontal genetic transfer. Collapse Key Words Collapse MESH Headings Adaptation, Physiological/genetics Animals Desiccation Ecosystem Fresh Water Gene Transfer, Horizontal Genetic Speciation Genome, Helminth Genomics/methods Phylogeny Rotifera/classification Rotifera/genetics Synteny Tetraploidy Whole Genome Sequencing Collapse Grants BB/F020562/1 Biotechnology and Biological Sciences Research Council BB/F020856/1 Biotechnology and Biological Sciences Research Council Collapse
9	ComplexViewer: visualization of curated macromolecular complexes. Bioinformatics 2017;33:3673-3675. [PMID: 29036573 PMCID: PMC5870653 DOI: 10.1093/bioinformatics/btx497] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2017] [Revised: 07/21/2017] [Accepted: 08/01/2017] [Indexed: 11/23/2022] Open Abstract SUMMARY Proteins frequently function as parts of complexes, assemblages of multiple proteins and other biomolecules, yet network visualizations usually only show proteins as parts of binary interactions. ComplexViewer visualizes interactions with more than two participants and thereby avoids the need to first expand these into multiple binary interactions. Furthermore, if binding regions between molecules are known then these can be displayed in the context of the larger complex. AVAILABILITY AND IMPLEMENTATION freely available under Apache version 2 license; EMBL-EBI Complex Portal: http://www.ebi.ac.uk/complexportal; Source code: https://github.com/MICommunity/ComplexViewer; Package: https://www.npmjs.com/package/complexviewer; http://biojs.io/d/complexviewer. Language: JavaScript; Web technology: Scalable Vector Graphics; Libraries: D3.js. CONTACT colin.combe@ed.ac.uk or juri.rappsilber@ed.ac.uk. Collapse Key Words Collapse MESH Headings Computational Biology/methods Macromolecular Substances/metabolism Models, Biological Protein Binding Protein Interaction Domains and Motifs Protein Interaction Maps Software Collapse Grants Wellcome Trust 103139/Z/13/Z Wellcome Trust 203149 Wellcome Trust Collapse
10	Constructing synthetic biology workflows in the cloud. ENGINEERING BIOLOGY 2017. [DOI: 10.1049/enb.2017.0001] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
11	Empirical Bayes method for reducing false discovery rates of correlation matrices with block diagonal structure. BMC Bioinformatics 2017;18:213. [PMID: 28403823 PMCID: PMC5389176 DOI: 10.1186/s12859-017-1623-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2016] [Accepted: 04/01/2017] [Indexed: 11/21/2022] Open Abstract Background Correlation matrices are important in inferring relationships and networks between regulatory or signalling elements in biological systems. With currently available technology sample sizes for experiments are typically small, meaning that these correlations can be difficult to estimate. At a genome-wide scale estimation of correlation matrices can also be computationally demanding. Results We develop an empirical Bayes approach to improve covariance estimates for gene expression, where we assume the covariance matrix takes a block diagonal form. Our method shows lower false discovery rates than existing methods on simulated data. Applied to a real data set from Bacillus subtilis we demonstrate it’s ability to detecting known regulatory units and interactions between them. Conclusions We demonstrate that, compared to existing methods, our method is able to find significant covariances and also to control false discovery rates, even when the sample size is small (n=10). The method can be used to find potential regulatory networks, and it may also be used as a pre-processing step for methods that calculate, for example, partial correlations, so enabling the inference of the causal and hierarchical structure of the networks. Collapse Key Words Correlation Empirical Bayes Collapse MESH Headings Collapse Grants Collapse
12	Insights into electrosensory organ development, physiology and evolution from a lateral line-enriched transcriptome. eLife 2017;6. [PMID: 28346141 PMCID: PMC5429088 DOI: 10.7554/elife.24197] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2016] [Accepted: 03/23/2017] [Indexed: 01/22/2023] Open Abstract The anamniote lateral line system, comprising mechanosensory neuromasts and electrosensory ampullary organs, is a useful model for investigating the developmental and evolutionary diversification of different organs and cell types. Zebrafish neuromast development is increasingly well understood, but neither zebrafish nor Xenopus is electroreceptive and our molecular understanding of ampullary organ development is rudimentary. We have used RNA-seq to generate a lateral line-enriched gene-set from late-larval paddlefish (Polyodon spathula). Validation of a subset reveals expression in developing ampullary organs of transcription factor genes critical for hair cell development, and genes essential for glutamate release at hair cell ribbon synapses, suggesting close developmental, physiological and evolutionary links between non-teleost electroreceptors and hair cells. We identify an ampullary organ-specific proneural transcription factor, and candidates for the voltage-sensing L-type Ca_v channel and rectifying K_v channel predicted from skate (cartilaginous fish) ampullary organ electrophysiology. Overall, our results illuminate ampullary organ development, physiology and evolution. Collapse Key Words Atoh1 Cav1.3 Cavβ2 Kv1.5 Kvβ3 Neurod4 Polyodon spathula (Mississippi paddlefish) Pou4f3 Vglut3 ampullary organs beta-parvalbumins developmental biology electroreceptors hair cells hh neuromasts neuroscience oncomodulin otoferlin stem cells synaptic ribbons voltage-gated ion channels Collapse MESH Headings Collapse Grants Collapse
13	Urinary Exosomes Contain MicroRNAs Capable of Paracrine Modulation of Tubular Transporters in Kidney. Sci Rep 2017;7:40601. [PMID: 28094285 PMCID: PMC5240140 DOI: 10.1038/srep40601] [Citation(s) in RCA: 50] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2016] [Accepted: 11/29/2016] [Indexed: 12/22/2022] Open Abstract Exosomes derived from all nephron segments are present in human urine, where their functionality is incompletely understood. Most studies have focused on biomarker discovery rather than exosome function. Through sequencing we identified the miRNA repertoire of urinary exosomes from healthy volunteers; 276 mature miRNAs and 345 pre-miRNAs were identified (43%/7% of reads). Among the most abundant were members of the miR-10, miR-30 and let-7 families. Targets for the identified miRNAs were predicted using five different databases; genes encoding membrane transporters and their regulators were enriched, highlighting the possibility that these miRNAs could modulate key renal tubular functions in a paracrine manner. As proof of concept, cultured renal epithelial cells were exposed to urinary exosomes and cellular exosomal uptake was confirmed; thereafter, reduced levels of the potassium channel ROMK and kinases SGK1 and WNK1 were observed in a human collecting duct cell line, while SPAK was unaltered. In proximal tubular cells, mRNA levels of the amino acid transporter gene SLC38A2 were diminished and reflected in a significant decrement of its encoded protein SNAT2. Protein levels of the kinase SGK1 did not change. Thus we demonstrated a novel potential function for miRNA in urinary exosomes. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
14	ThaleMine: A Warehouse for Arabidopsis Data Integration and Discovery. PLANT & CELL PHYSIOLOGY 2017;58:e4. [PMID: 28013278 DOI: 10.1093/pcp/pcw200] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/01/2016] [Accepted: 11/11/2016] [Indexed: 05/08/2023] Abstract ThaleMine (https://apps.araport.org/thalemine/) is a comprehensive data warehouse that integrates a wide array of genomic information of the model plant Arabidopsis thaliana. The data collection currently includes the latest structural and functional annotation from the Araport11 update, the Col-0 genome sequence, RNA-seq and array expression, co-expression, protein interactions, homologs, pathways, publications, alleles, germplasm and phenotypes. The data are collected from a wide variety of public resources. Users can browse gene-specific data through Gene Report pages, identify and create gene lists based on experiments or indexed keywords, and run GO enrichment analysis to investigate the biological significance of selected gene sets. Developed by the Arabidopsis Information Portal project (Araport, https://www.araport.org/), ThaleMine uses the InterMine software framework, which builds well-structured data, and provides powerful data query and analysis functionality. The warehoused data can be accessed by users via graphical interfaces, as well as programmatically via web-services. Here we describe recent developments in ThaleMine including new features and extensions, and discuss future improvements. InterMine has been broadly adopted by the model organism research community including nematode, rat, mouse, zebrafish, budding yeast, the modENCODE project, as well as being used for human data. ThaleMine is the first InterMine developed for a plant model. As additional new plant InterMines are developed by the legume and other plant research communities, the potential of cross-organism integrative data analysis will be further enabled. Collapse Key Words Arabidopsis thaliana InterMine data integration data warehouse genomics web services Collapse MESH Headings Arabidopsis/genetics Arabidopsis Proteins/genetics Arabidopsis Proteins/metabolism Computational Biology/methods Databases, Genetic Gene Expression Profiling Gene Expression Regulation, Plant/genetics Gene Ontology Genomics/methods Information Storage and Retrieval/methods Internet Protein Interaction Mapping/methods Protein Interaction Maps/genetics Reproducibility of Results Sequence Analysis, RNA Collapse Grants Collapse
15	Cross-organism analysis using InterMine. Genesis 2015;53:547-60. [PMID: 26097192 DOI: 10.1002/dvg.22869] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2015] [Revised: 06/17/2015] [Accepted: 06/17/2015] [Indexed: 01/01/2023] Abstract InterMine is a data integration warehouse and analysis software system developed for large and complex biological data sets. Designed for integrative analysis, it can be accessed through a user-friendly web interface. For bioinformaticians, extensive web services as well as programming interfaces for most common scripting languages support access to all features. The web interface includes a useful identifier look-up system, and both simple and sophisticated search options. Interactive results tables enable exploration, and data can be filtered, summarized, and browsed. A set of graphical analysis tools provide a rich environment for data exploration including statistical enrichment of sets of genes or other entities. InterMine databases have been developed for the major model organisms, budding yeast, nematode worm, fruit fly, zebrafish, mouse, and rat together with a newly developed human database. Here, we describe how this has facilitated interoperation and development of cross-organism analysis tools and reports. InterMine as a data exploration and analysis tool is also described. All the InterMine-based systems described in this article are resources freely available to the scientific community. Collapse Key Words comparative analysis cross-organism analysis data analysis data integration genomics integrative analysis proteomics Collapse MESH Headings Collapse Grants Collapse
16	toxoMine: an integrated omics data warehouse for Toxoplasma gondii systems biology research. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2015;2015:bav066. [PMID: 26130662 PMCID: PMC4485433 DOI: 10.1093/database/bav066] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/13/2015] [Accepted: 06/09/2015] [Indexed: 01/09/2023] Abstract Toxoplasma gondii (T. gondii) is an obligate intracellular parasite that must monitor for changes in the host environment and respond accordingly; however, it is still not fully known which genetic or epigenetic factors are involved in regulating virulence traits of T. gondii. There are on-going efforts to elucidate the mechanisms regulating the stage transition process via the application of high-throughput epigenomics, genomics and proteomics techniques. Given the range of experimental conditions and the typical yield from such high-throughput techniques, a new challenge arises: how to effectively collect, organize and disseminate the generated data for subsequent data analysis. Here, we describe toxoMine, which provides a powerful interface to support sophisticated integrative exploration of high-throughput experimental data and metadata, providing researchers with a more tractable means toward understanding how genetic and/or epigenetic factors play a coordinated role in determining pathogenicity of T. gondii. As a data warehouse, toxoMine allows integration of high-throughput data sets with public T. gondii data. toxoMine is also able to execute complex queries involving multiple data sets with straightforward user interaction. Furthermore, toxoMine allows users to define their own parameters during the search process that gives users near-limitless search and query capabilities. The interoperability feature also allows users to query and examine data available in other InterMine systems, which would effectively augment the search scope beyond what is available to toxoMine. toxoMine complements the major community database ToxoDB by providing a data warehouse that enables more extensive integrative studies for T. gondii. Given all these factors, we believe it will become an indispensable resource to the greater infectious disease research community. Database URL:http://toxomine.org Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
17	Expression of multiple horizontally acquired genes is a hallmark of both vertebrate and invertebrate genomes. Genome Biol 2015;16:50. [PMID: 25785303 PMCID: PMC4358723 DOI: 10.1186/s13059-015-0607-3] [Citation(s) in RCA: 166] [Impact Index Per Article: 18.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2014] [Accepted: 02/04/2015] [Indexed: 01/17/2023] Open Abstract Background A fundamental concept in biology is that heritable material, DNA, is passed from parent to offspring, a process called vertical gene transfer. An alternative mechanism of gene acquisition is through horizontal gene transfer (HGT), which involves movement of genetic material between different species. HGT is well-known in single-celled organisms such as bacteria, but its existence in higher organisms, including animals, is less well established, and is controversial in humans. Results We have taken advantage of the recent availability of a sufficient number of high-quality genomes and associated transcriptomes to carry out a detailed examination of HGT in 26 animal species (10 primates, 12 flies and four nematodes) and a simplified analysis in a further 14 vertebrates. Genome-wide comparative and phylogenetic analyses show that HGT in animals typically gives rise to tens or hundreds of active ‘foreign’ genes, largely concerned with metabolism. Our analyses suggest that while fruit flies and nematodes have continued to acquire foreign genes throughout their evolution, humans and other primates have gained relatively few since their common ancestor. We also resolve the controversy surrounding previous evidence of HGT in humans and provide at least 33 new examples of horizontally acquired genes. Conclusions We argue that HGT has occurred, and continues to occur, on a previously unsuspected scale in metazoans and is likely to have contributed to biochemical diversification during animal evolution. Electronic supplementary material The online version of this article (doi:10.1186/s13059-015-0607-3) contains supplementary material, which is available to authorized users. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
18	Araport: the Arabidopsis information portal. Nucleic Acids Res 2014;43:D1003-9. [PMID: 25414324 PMCID: PMC4383980 DOI: 10.1093/nar/gku1200] [Citation(s) in RCA: 138] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open Abstract The Arabidopsis Information Portal (https://www.araport.org) is a new online resource for plant biology research. It houses the Arabidopsis thaliana genome sequence and associated annotation. It was conceived as a framework that allows the research community to develop and release ‘modules’ that integrate, analyze and visualize Arabidopsis data that may reside at remote sites. The current implementation provides an indexed database of core genomic information. These data are made available through feature-rich web applications that provide search, data mining, and genome browser functionality, and also by bulk download and web services. Araport uses software from the InterMine and JBrowse projects to expose curated data from TAIR, GO, BAR, EBI, UniProt, PubMed and EPIC CoGe. The site also hosts ‘science apps,’ developed as prototypes for community modules that use dynamic web pages to present data obtained on-demand from third-party servers via RESTful web services. Designed for sustainability, the Arabidopsis Information Portal strategy exploits existing scientific computing infrastructure, adopts a practical mixture of data integration technologies and encourages collaborative enhancement of the resource by its user community. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
19	esyN: network building, sharing and publishing. PLoS One 2014;9:e106035. [PMID: 25181461 PMCID: PMC4152123 DOI: 10.1371/journal.pone.0106035] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2014] [Accepted: 07/27/2014] [Indexed: 01/18/2023] Open Abstract The construction and analysis of networks is increasingly widespread in biological research. We have developed esyN ("easy networks") as a free and open source tool to facilitate the exchange of biological network models between researchers. esyN acts as a searchable database of user-created networks from any field. We have developed a simple companion web tool that enables users to view and edit networks using data from publicly available databases. Both normal interaction networks (graphs) and Petri nets can be created. In addition to its basic tools, esyN contains a number of logical templates that can be used to create models more easily. The ability to use previously published models as building blocks makes esyN a powerful tool for the construction of models and network graphs. Users are able to save their own projects online and share them either publicly or with a list of collaborators. The latter can be given the ability to edit the network themselves, allowing online collaboration on network construction. esyN is designed to facilitate unrestricted exchange of this increasingly important type of biological information. Ultimately, the aim of esyN is to bring the advantages of Open Source software development to the construction of biological networks. Collapse Key Words Collapse MESH Headings Gene Regulatory Networks Information Dissemination Protein Kinases/metabolism Publishing Signal Transduction Software Substrate Specificity Collapse Grants Wellcome Trust MC_G1000734 Medical Research Council Collapse
20	InterMine: extensive web services for modern biology. Nucleic Acids Res 2014;42:W468-72. [PMID: 24753429 PMCID: PMC4086141 DOI: 10.1093/nar/gku301] [Citation(s) in RCA: 80] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open Abstract InterMine (www.intermine.org) is a biological data warehousing system providing extensive automatically generated and configurable RESTful web services that underpin the web interface and can be re-used in many other applications: to find and filter data; export it in a flexible and structured way; to upload, use, manipulate and analyze lists; to provide services for flexible retrieval of sequence segments, and for other statistical and analysis tools. Here we describe these features and discuss how they can be used separately or in combinations to support integrative and comparative analysis. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
21	BioJS DAGViewer: A reusable JavaScript component for displaying directed graphs. F1000Res 2014;3:51. [PMID: 24627804 PMCID: PMC3945768 DOI: 10.12688/f1000research.3-51.v1] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 02/10/2014] [Indexed: 11/20/2022] Open Abstract Summary: The DAGViewer BioJS component is a reusable JavaScript component made available as part of the BioJS project and intended to be used to display graphs of structured data, with a particular emphasis on Directed Acyclic Graphs (DAGs). It enables users to embed representations of graphs of data, such as ontologies or phylogenetic trees, in hyper-text documents (HTML). This component is generic, since it is capable (given the appropriate configuration) of displaying any kind of data that is organised as a graph. The features of this component which are useful for examining and filtering large and complex graphs are described. Availability:http://github.com/alexkalderimis/dag-viewer-biojs; http://github.com/biojs/biojs; http://dx.doi.org/10.5281/zenodo.8303. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
22	Integrating microRNA and mRNA expression profiling in Symbiodinium microadriaticum, a dinoflagellate symbiont of reef-building corals. BMC Genomics 2013;14:704. [PMID: 24119094 PMCID: PMC3853145 DOI: 10.1186/1471-2164-14-704] [Citation(s) in RCA: 74] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2013] [Accepted: 09/25/2013] [Indexed: 11/25/2022] Open Abstract Background Animal and plant genomes produce numerous small RNAs (smRNAs) that regulate gene expression post-transcriptionally affecting metabolism, development, and epigenetic inheritance. In order to characterize the repertoire of endogenous smRNAs and potential gene targets in dinoflagellates, we conducted smRNA and mRNA expression profiling over 9 experimental treatments of cultures from Symbiodinium microadriaticum, a photosynthetic symbiont of scleractinian corals. Results We identified a set of 21 novel smRNAs that share stringent key features with functional microRNAs from other model organisms. smRNAs were predicted independently over all 9 treatments and their putative gene targets were identified. We found 1,720 animal-like target sites in the 3'UTRs of 12,858 mRNAs and 19 plant-like target sites in 51,917 genes. We assembled a transcriptome of 58,649 genes and determined differentially expressed genes (DEGs) between treatments. Heat stress was found to produce a much larger number of DEGs than other treatments that yielded only few DEGs. Analysis of DEGs also revealed that minicircle-encoded photosynthesis proteins seem to be common targets of transcriptional regulation. Furthermore, we identified the core RNAi protein machinery in Symbiodinium. Conclusions Integration of smRNA and mRNA expression profiling identified a variety of processes that could be under microRNA control, e.g. protein modification, signaling, gene expression, and response to DNA damage. Given that Symbiodinium seems to have a paucity of transcription factors and differentially expressed genes, identification and characterization of its smRNA repertoire establishes the possibility of a range of gene regulatory mechanisms in dinoflagellates acting post-transcriptionally. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
23	metabolicMine: an integrated genomics, genetics and proteomics data warehouse for common metabolic disease research. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2013;2013:bat060. [PMID: 23935057 PMCID: PMC4438919 DOI: 10.1093/database/bat060] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/31/2023] Abstract Common metabolic and endocrine diseases such as diabetes affect millions of people worldwide and have a major health impact, frequently leading to complications and mortality. In a search for better prevention and treatment, there is ongoing research into the underlying molecular and genetic bases of these complex human diseases, as well as into the links with risk factors such as obesity. Although an increasing number of relevant genomic and proteomic data sets have become available, the quantity and diversity of the data make their efficient exploitation challenging. Here, we present metabolicMine, a data warehouse with a specific focus on the genomics, genetics and proteomics of common metabolic diseases. Developed in collaboration with leading UK metabolic disease groups, metabolicMine integrates data sets from a range of experiments and model organisms alongside tools for exploring them. The current version brings together information covering genes, proteins, orthologues, interactions, gene expression, pathways, ontologies, diseases, genome-wide association studies and single nucleotide polymorphisms. Although the emphasis is on human data, key data sets from mouse and rat are included. These are complemented by interoperation with the RatMine rat genomics database, with a corresponding mouse version under development by the Mouse Genome Informatics (MGI) group. The web interface contains a number of features including keyword search, a library of Search Forms, the QueryBuilder and list analysis tools. This provides researchers with many different ways to analyse, view and flexibly export data. Programming interfaces and automatic code generation in several languages are supported, and many of the features of the web interface are available through web services. The combination of diverse data sets integrated with analysis tools and a powerful query system makes metabolicMine a valuable research resource. The web interface makes it accessible to first-time users, whereas the Application Programming Interface (API) and web services provide convenient data access and tools for bioinformaticians. metabolicMine is freely available online at http://www.metabolicmine.org Database URL: http://www.metabolicmine.org. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
24	Biochemical diversification through foreign gene expression in bdelloid rotifers. PLoS Genet 2012;8:e1003035. [PMID: 23166508 PMCID: PMC3499245 DOI: 10.1371/journal.pgen.1003035] [Citation(s) in RCA: 103] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2012] [Accepted: 08/29/2012] [Indexed: 11/19/2022] Open Abstract Bdelloid rotifers are microinvertebrates with unique characteristics: they have survived tens of millions of years without sexual reproduction; they withstand extreme desiccation by undergoing anhydrobiosis; and they tolerate very high levels of ionizing radiation. Recent evidence suggests that subtelomeric regions of the bdelloid genome contain sequences originating from other organisms by horizontal gene transfer (HGT), of which some are known to be transcribed. However, the extent to which foreign gene expression plays a role in bdelloid physiology is unknown. We address this in the first large scale analysis of the transcriptome of the bdelloid Adineta ricciae: cDNA libraries from hydrated and desiccated bdelloids were subjected to massively parallel sequencing and assembled transcripts compared against the UniProtKB database by blastx to identify their putative products. Of ~29,000 matched transcripts, ~10% were inferred from blastx matches to be horizontally acquired, mainly from eubacteria but also from fungi, protists, and algae. After allowing for possible sources of error, the rate of HGT is at least 8%-9%, a level significantly higher than other invertebrates. We verified their foreign nature by phylogenetic analysis and by demonstrating linkage of foreign genes with metazoan genes in the bdelloid genome. Approximately 80% of horizontally acquired genes expressed in bdelloids code for enzymes, and these represent 39% of enzymes in identified pathways. Many enzymes encoded by foreign genes enhance biochemistry in bdelloids compared to other metazoans, for example, by potentiating toxin degradation or generation of antioxidants and key metabolites. They also supplement, and occasionally potentially replace, existing metazoan functions. Bdelloid rotifers therefore express horizontally acquired genes on a scale unprecedented in animals, and foreign genes make a profound contribution to their metabolism. This represents a potential mechanism for ancient asexuals to adapt rapidly to changing environments and thereby persist over long evolutionary time periods in the absence of sex. Collapse Key Words Collapse MESH Headings Animals Desiccation Gene Expression Gene Library Gene Transfer, Horizontal Metabolic Networks and Pathways/genetics Phylogeny Radiation, Ionizing Rotifera/genetics Rotifera/physiology Transcriptome Collapse Grants BB/F020562/1 Biotechnology and Biological Sciences Research Council BB/F020856/1 Biotechnology and Biological Sciences Research Council Collapse
25	InterMine: a flexible data warehouse system for the integration and analysis of heterogeneous biological data. Bioinformatics 2012;28:3163-5. [PMID: 23023984 PMCID: PMC3516146 DOI: 10.1093/bioinformatics/bts577] [Citation(s) in RCA: 163] [Impact Index Per Article: 13.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open Abstract Summary: InterMine is an open-source data warehouse system that facilitates the building of databases with complex data integration requirements and a need for a fast customizable query facility. Using InterMine, large biological databases can be created from a range of heterogeneous data sources, and the extensible data model allows for easy integration of new data types. The analysis tools include a flexible query builder, genomic region search and a library of ‘widgets’ performing various statistical analyses. The results can be exported in many commonly used formats. InterMine is a fully extensible framework where developers can add new tools and functionality. Additionally, there is a comprehensive set of web services, for which client libraries are provided in five commonly used programming languages. Availability: Freely available from http://www.intermine.org under the LGPL license. Contact:g.micklem@gen.cam.ac.uk Supplementary information:Supplementary data are available at Bioinformatics online. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
26	Multiple functionally divergent and conserved copies of alpha tubulin in bdelloid rotifers. BMC Evol Biol 2012;12:148. [PMID: 22901238 PMCID: PMC3464624 DOI: 10.1186/1471-2148-12-148] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2012] [Accepted: 08/11/2012] [Indexed: 12/31/2022] Open Abstract BACKGROUND Bdelloid rotifers are microscopic animals that have apparently survived without sex for millions of years and are able to survive desiccation at all life stages through a process called anhydrobiosis. Both of these characteristics are believed to have played a role in shaping several unusual features of bdelloid genomes discovered in recent years. Studies into the impact of asexuality and anhydrobiosis on bdelloid genomes have focused on understanding gene copy number. Here we investigate copy number and sequence divergence in alpha tubulin. Alpha tubulin is conserved and normally present in low copy numbers in animals, but multiplication of alpha tubulin copies has occurred in animals adapted to extreme environments, such as cold-adapted Antarctic fish. Using cloning and sequencing we compared alpha tubulin copy variation in four species of bdelloid rotifers and four species of monogonont rotifers, which are facultatively sexual and cannot survive desiccation as adults. Results were verified using transcriptome data from one bdelloid species, Adineta ricciae. RESULTS In common with the typical pattern for animals, monogonont rotifers contain either one or two copies of alpha tubulin, but bdelloid species contain between 11 and 13 different copies, distributed across five classes. Approximately half of the copies form a highly conserved group that vary by only 1.1% amino acid pairwise divergence with each other and with the monogonont copies. The other copies have divergent amino acid sequences that evolved significantly faster between classes than within them, relative to synonymous changes, and vary in predicted biochemical properties. Copies of each class were expressed under the laboratory conditions used to construct the transcriptome. CONCLUSIONS Our findings are consistent with recent evidence that bdelloids are degenerate tetraploids and that functional divergence of ancestral copies of genes has occurred, but show how further duplication events in the ancestor of bdelloids led to proliferation in both conserved and functionally divergent copies of this gene. Collapse Key Words bdelloid rotifers gene copies tubulin evolution Collapse MESH Headings Animals Cloning, Molecular Conserved Sequence Evolution, Molecular Exons Gene Dosage Introns Phylogeny Rotifera/genetics Sequence Alignment Sequence Analysis, DNA Transcriptome Tubulin/genetics Collapse Grants 233232 European Research Council BB/F020856/1 Biotechnology and Biological Sciences Research Council BB/F020562/1 Biotechnology and Biological Sciences Research Council Collapse
27	YeastMine--an integrated data warehouse for Saccharomyces cerevisiae data as a multipurpose tool-kit. Database (Oxford) 2012;2012:bar062. [PMID: 22434830 PMCID: PMC3308152 DOI: 10.1093/database/bar062] [Citation(s) in RCA: 197] [Impact Index Per Article: 16.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2011] [Revised: 12/01/2011] [Accepted: 12/05/2011] [Indexed: 11/14/2022] Abstract The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org/) provides high-quality curated genomic, genetic, and molecular information on the genes and their products of the budding yeast Saccharomyces cerevisiae. To accommodate the increasingly complex, diverse needs of researchers for searching and comparing data, SGD has implemented InterMine (http://www.InterMine.org), an open source data warehouse system with a sophisticated querying interface, to create YeastMine (http://yeastmine.yeastgenome.org). YeastMine is a multifaceted search and retrieval environment that provides access to diverse data types. Searches can be initiated with a list of genes, a list of Gene Ontology terms, or lists of many other data types. The results from queries can be combined for further analysis and saved or downloaded in customizable file formats. Queries themselves can be customized by modifying predefined templates or by creating a new template to access a combination of specific data types. YeastMine offers multiple scenarios in which it can be used such as a powerful search interface, a discovery tool, a curation aid and also a complex database presentation format. DATABASE URL: http://yeastmine.yeastgenome.org. Collapse Key Words Collapse MESH Headings Database Management Systems Databases, Genetic Genome, Fungal Internet Saccharomyces cerevisiae/genetics User-Computer Interface Collapse Grants R01 HG004834 NHGRI NIH HHS P41 HG001315 NHGRI NIH HHS Collapse
28	modMine: flexible access to modENCODE data. Nucleic Acids Res 2011;40:D1082-8. [PMID: 22080565 PMCID: PMC3245176 DOI: 10.1093/nar/gkr921] [Citation(s) in RCA: 99] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open Abstract In an effort to comprehensively characterize the functional elements within the genomes of the important model organisms Drosophila melanogaster and Caenorhabditis elegans, the NHGRI model organism Encyclopaedia of DNA Elements (modENCODE) consortium has generated an enormous library of genomic data along with detailed, structured information on all aspects of the experiments. The modMine database (http://intermine.modencode.org) described here has been built by the modENCODE Data Coordination Center to allow the broader research community to (i) search for and download data sets of interest among the thousands generated by modENCODE; (ii) access the data in an integrated form together with non-modENCODE data sets; and (iii) facilitate fine-grained analysis of the above data. The sophisticated search features are possible because of the collection of extensive experimental metadata by the consortium. Interfaces are provided to allow both biologists and bioinformaticians to exploit these rich modENCODE data sets now available via modMine. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
29	The modENCODE Data Coordination Center: lessons in harvesting comprehensive experimental details. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2011;2011:bar023. [PMID: 21856757 PMCID: PMC3170170 DOI: 10.1093/database/bar023] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Abstract The model organism Encyclopedia of DNA Elements (modENCODE) project is a National Human Genome Research Institute (NHGRI) initiative designed to characterize the genomes of Drosophila melanogaster and Caenorhabditis elegans. A Data Coordination Center (DCC) was created to collect, store and catalog modENCODE data. An effective DCC must gather, organize and provide all primary, interpreted and analyzed data, and ensure the community is supplied with the knowledge of the experimental conditions, protocols and verification checks used to generate each primary data set. We present here the design principles of the modENCODE DCC, and describe the ramifications of collecting thorough and deep metadata for describing experiments, including the use of a wiki for capturing protocol and reagent information, and the BIR-TAB specification for linking biological samples to experimental results. modENCODE data can be found at http://www.modencode.org. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
30	MicroRNAs dysregulated in breast cancer preferentially target key oncogenic pathways. MOLECULAR BIOSYSTEMS 2011;7:2571-6. [PMID: 21766137 DOI: 10.1039/c1mb05181d] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Abstract MicroRNA (miRNA) dysregulation has been associated with numerous cancers including breast cancer. The dysregulation of miRNAs in cancer has been shown to perturb various pathways, with oncogenic effects. Here we investigate the relationship between dysregulated miRNAs and pathways involved in breast cancer by integrating miRNA and mRNA expression data. From a list of dysregulated miRNAs, we started by selecting the subset that appear to be regulating genes differentially expressed in breast cancer vs. normal tissue. Individually and as a group, this subset was found to target several canonical oncogenic pathways including the p53 signalling pathway, MAPK signalling pathway, TGFβ signalling pathway, focal adhesion and cell cycle progression. These results suggest that the dysregulation of miRNAs in breast cancer not only results in widespread changes to gene expression, but also the dysregulation of key oncogenic pathways. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
31	The impact of quantitative optimization of hybridization conditions on gene expression analysis. BMC Bioinformatics 2011;12:73. [PMID: 21401920 PMCID: PMC3065421 DOI: 10.1186/1471-2105-12-73] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2010] [Accepted: 03/14/2011] [Indexed: 12/05/2022] Open Abstract Background With the growing availability of entire genome sequences, an increasing number of scientists can exploit oligonucleotide microarrays for genome-scale expression studies. While probe-design is a major research area, relatively little work has been reported on the optimization of microarray protocols. Results As shown in this study, suboptimal conditions can have considerable impact on biologically relevant observations. For example, deviation from the optimal temperature by one degree Celsius lead to a loss of up to 44% of differentially expressed genes identified. While genes from thousands of Gene Ontology categories were affected, transcription factors and other low-copy-number regulators were disproportionately lost. Calibrated protocols are thus required in order to take full advantage of the large dynamic range of microarrays. For an objective optimization of protocols we introduce an approach that maximizes the amount of information obtained per experiment. A comparison of two typical samples is sufficient for this calibration. We can ensure, however, that optimization results are independent of the samples and the specific measures used for calibration. Both simulations and spike-in experiments confirmed an unbiased determination of generally optimal experimental conditions. Conclusions Well calibrated hybridization conditions are thus easily achieved and necessary for the efficient detection of differential expression. They are essential for the sensitive pro filing of low-copy-number molecules. This is particularly critical for studies of transcription factor expression, or the inference and study of regulatory networks. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
32	Identification of functional elements and regulatory circuits by Drosophila modENCODE. Science 2010;330:1787-97. [PMID: 21177974 PMCID: PMC3192495 DOI: 10.1126/science.1198374] [Citation(s) in RCA: 899] [Impact Index Per Article: 64.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Abstract To gain insight into how genomic information is translated into cellular and developmental programs, the Drosophila model organism Encyclopedia of DNA Elements (modENCODE) project is comprehensively mapping transcripts, histone modifications, chromosomal proteins, transcription factors, replication proteins and intermediates, and nucleosome properties across a developmental time course and in multiple cell lines. We have generated more than 700 data sets and discovered protein-coding, noncoding, RNA regulatory, replication, and chromatin elements, more than tripling the annotated portion of the Drosophila genome. Correlated activity patterns of these elements reveal a functional regulatory network, which predicts putative new functions for genes, reveals stage- and tissue-specific regulators, and enables gene-expression prediction. Our results provide a foundation for directed experimental and computational studies in Drosophila and related species and also a model for systematic data integration toward comprehensive genomic and functional annotation. Collapse Key Words Collapse MESH Headings Animals Binding Sites Chromatin/genetics Chromatin/metabolism Computational Biology/methods Drosophila Proteins/genetics Drosophila Proteins/metabolism Drosophila melanogaster/genetics Drosophila melanogaster/growth & development Drosophila melanogaster/metabolism Epigenesis, Genetic Gene Expression Regulation Gene Regulatory Networks Genes, Insect Genome, Insect Genomics/methods Histones/metabolism Molecular Sequence Annotation Nucleosomes/genetics Nucleosomes/metabolism Promoter Regions, Genetic RNA, Small Untranslated/genetics RNA, Small Untranslated/metabolism Transcription Factors/metabolism Transcription, Genetic Collapse Grants R01 HG004037 NHGRI NIH HHS U01HG004261 NHGRI NIH HHS Howard Hughes Medical Institute R01HG004037 NHGRI NIH HHS U01HG004279 NHGRI NIH HHS U41HG004269 NHGRI NIH HHS U01 HG004279 NHGRI NIH HHS U01HG004264 NHGRI NIH HHS R01 GM081871 NIGMS NIH HHS U01HG004274 NHGRI NIH HHS RC2HG005639 NHGRI NIH HHS U01HG004271 NHGRI NIH HHS U01 HG004271 NHGRI NIH HHS U01HG004258 NHGRI NIH HHS ZIA DK015600-14 Intramural NIH HHS U01 HG004258 NHGRI NIH HHS Collapse
33	Integrative analysis of the Caenorhabditis elegans genome by the modENCODE project. Science 2010;330:1775-87. [PMID: 21177976 PMCID: PMC3142569 DOI: 10.1126/science.1196914] [Citation(s) in RCA: 741] [Impact Index Per Article: 52.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Abstract We systematically generated large-scale data sets to improve genome annotation for the nematode Caenorhabditis elegans, a key model organism. These data sets include transcriptome profiling across a developmental time course, genome-wide identification of transcription factor-binding sites, and maps of chromatin organization. From this, we created more complete and accurate gene models, including alternative splice forms and candidate noncoding RNAs. We constructed hierarchical networks of transcription factor-binding and microRNA interactions and discovered chromosomal locations bound by an unusually large number of transcription factors. Different patterns of chromatin composition and histone modification were revealed between chromosome arms and centers, with similarly prominent differences between autosomes and the X chromosome. Integrating data types, we built statistical models relating chromatin, transcription factor binding, and gene expression. Overall, our analyses ascribed putative functions to most of the conserved genome. Collapse Key Words Collapse MESH Headings Animals Caenorhabditis elegans/genetics Caenorhabditis elegans/growth & development Caenorhabditis elegans/metabolism Caenorhabditis elegans Proteins/genetics Caenorhabditis elegans Proteins/metabolism Chromatin/genetics Chromatin/metabolism Chromatin/ultrastructure Chromosomes/genetics Chromosomes/metabolism Chromosomes/ultrastructure Computational Biology/methods Conserved Sequence Evolution, Molecular Gene Expression Profiling Gene Expression Regulation Gene Regulatory Networks Genes, Helminth Genome, Helminth Genomics/methods Histones/metabolism Models, Genetic Molecular Sequence Annotation RNA, Helminth/genetics RNA, Helminth/metabolism RNA, Untranslated/genetics RNA, Untranslated/metabolism Regulatory Sequences, Nucleic Acid Transcription Factors/genetics Transcription Factors/metabolism Collapse Grants R01 GM088565-03 NIGMS NIH HHS 054523 Wellcome Trust Howard Hughes Medical Institute U01 HG004270 NHGRI NIH HHS Wellcome Trust R01 GM088565 NIGMS NIH HHS R01GM088565 NIGMS NIH HHS Collapse
34	Identification and analysis of serpin-family genes by homology and synteny across the 12 sequenced Drosophilid genomes. BMC Genomics 2009;10:489. [PMID: 19849829 PMCID: PMC2770083 DOI: 10.1186/1471-2164-10-489] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2009] [Accepted: 10/22/2009] [Indexed: 12/16/2022] Open Abstract Background The Drosophila melanogaster genome contains 29 serpin genes, 12 as single transcripts and 17 within 6 gene clusters. Many of these serpins have a conserved "hinge" motif characteristic of active proteinase inhibitors. However, a substantial proportion (42%) lacks this motif and represents non-inhibitory serpin-fold proteins of unknown function. Currently, it is not known whether orthologous, inhibitory serpin genes retain the same target proteinase specificity within the Drosophilid lineage, nor whether they give rise to non-inhibitory serpin-fold proteins or other, more diverged, proteins. Results We collated 188 orthologues to the D. melanogaster serpins from the other 11 Drosophilid genomes and used synteny to find further family members, raising the total to 226, or 71% of the number of orthologues expected assuming complete conservation across all 12 Drosophilid species. In general the sequence constraints on the serpin-fold itself are loose. The critical Reactive Centre Loop (RCL) sequence, including the target proteinase cleavage site, is strongly conserved in inhibitory serpins, although there are 3 exceptional sets of orthologues in which the evolutionary constraints are looser. Conversely, the RCL of non-inhibitory serpin orthologues is less conserved, with 3 exceptions that presumably bind to conserved partner molecules. We derive a consensus hinge motif, for Drosophilid inhibitory serpins, which differs somewhat from that of the vertebrate consensus. Three gene clusters appear to have originated in the melanogaster subgroup, Spn28D, Spn77B and Spn88E, each containing one inhibitory serpin orthologue that is present in all Drosophilids. In addition, the Spn100A transcript appears to represent a novel serpin-derived fold. Conclusion In general, inhibitory serpins rarely change their range of proteinase targets, except by a duplication/divergence mechanism. Non-inhibitory serpins appear to derive from inhibitory serpins, but not the reverse. The conservation of different family members varied widely across the 12 sequenced Drosophilid genomes. An approach considering synteny as well as homology was important to find the largest set of orthologues. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
35	Phylogenetic and genomewide analyses suggest a functional relationship between kayak, the Drosophila fos homolog, and fig, a predicted protein phosphatase 2c nested within a kayak intron. Genetics 2007;177:1349-61. [PMID: 18039871 PMCID: PMC2147949 DOI: 10.1534/genetics.107.071670] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open Abstract A gene located within the intron of a larger gene is an uncommon arrangement in any species. Few of these nested gene arrangements have been explored from an evolutionary perspective. Here we report a phylogenetic analysis of kayak (kay) and fos intron gene (fig), a divergently transcribed gene located in a kay intron, utilizing 12 Drosophila species. The evolutionary relationship between these genes is of interest because kay is the homolog of the proto-oncogene c-fos whose function is modulated by serine/threonine phosphorylation and fig is a predicted PP2C phosphatase specific for serine/threonine residues. We found that, despite an extraordinary level of diversification in the intron-exon structure of kay (11 inversions and six independent exon losses), the nested arrangement of kay and fig is conserved in all species. A genomewide analysis of protein-coding nested gene pairs revealed that approximately 20% of nested pairs in D. melanogaster are also nested in D. pseudoobscura and D. virilis. A phylogenetic examination of fig revealed that there are three subfamilies of PP2C phosphatases in all 12 species of Drosophila. Overall, our phylogenetic and genomewide analyses suggest that the nested arrangement of kay and fig may be due to a functional relationship between them. Collapse Key Words Collapse MESH Headings Amino Acid Sequence Animals DNA/genetics Drosophila/classification Drosophila/genetics Drosophila/physiology Drosophila Proteins/genetics Drosophila Proteins/physiology Drosophila melanogaster/genetics Drosophila melanogaster/physiology Evolution, Molecular Genome, Insect Introns Molecular Sequence Data Phosphoprotein Phosphatases/genetics Phosphoprotein Phosphatases/physiology Phylogeny Protein Phosphatase 2C Sequence Homology, Amino Acid Species Specificity Collapse Grants R01 CA095875 NCI NIH HHS R01 HG002516 NHGRI NIH HHS HG002516 NHGRI NIH HHS CA095875 NCI NIH HHS Collapse
36	New tools for self-organized pattern formation. BMC SYSTEMS BIOLOGY 2007. [DOI: 10.1186/1752-0509-1-s1-s10] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
37	Bayesian modelling of shared gene function. Bioinformatics 2007;23:1936-44. [PMID: 17540682 DOI: 10.1093/bioinformatics/btm280] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open Abstract MOTIVATION Biological assays are often carried out on tissues that contain many cell lineages and active pathways. Microarray data produced using such material therefore reflect superimpositions of biological processes. Analysing such data for shared gene function by means of well-matched assays may help to provide a better focus on specific cell types and processes. The identification of genes that behave similarly in different biological systems also has the potential to reveal new insights into preserved biological mechanisms. RESULTS In this article, we propose a hierarchical Bayesian model allowing integrated analysis of several microarray data sets for shared gene function. Each gene is associated with an indicator variable that selects whether binary class labels are predicted from expression values or by a classifier which is common to all genes. Each indicator selects the component models for all involved data sets simultaneously. A quantitative measure of shared gene function is obtained by inferring a probability measure over these indicators. Through experiments on synthetic data, we illustrate potential advantages of this Bayesian approach over a standard method. A shared analysis of matched microarray experiments covering (a) a cycle of mouse mammary gland development and (b) the process of in vitro endothelial cell apoptosis is proposed as a biological gold standard. Several useful sanity checks are introduced during data analysis, and we confirm the prior biological belief that shared apoptosis events occur in both systems. We conclude that a Bayesian analysis for shared gene function has the potential to reveal new biological insights, unobtainable by other means. AVAILABILITY An online supplement and MatLab code are available at http://www.sykacek.net/research.html#mcabf Collapse Key Words Collapse MESH Headings Bayes Theorem Computer Simulation Data Interpretation, Statistical Gene Expression/physiology Gene Expression Profiling/methods Models, Biological Models, Statistical Oligonucleotide Array Sequence Analysis/methods Proteome/metabolism Signal Transduction/physiology Collapse Grants Collapse
38	FlyMine: an integrated database for Drosophila and Anopheles genomics. Genome Biol 2007;8:R129. [PMID: 17615057 PMCID: PMC2323218 DOI: 10.1186/gb-2007-8-7-r129] [Citation(s) in RCA: 269] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2006] [Revised: 03/06/2007] [Accepted: 06/05/2007] [Indexed: 01/29/2023] Open Abstract FlyMine is a data warehouse that addresses one of the important challenges of modern biology: how to integrate and make use of the diversity and volume of current biological data. Its main focus is genomic and proteomics data for Drosophila and other insects. It provides web access to integrated data at a number of different levels, from simple browsing to construction of complex queries, which can be executed on either single items or lists. Collapse Key Words Collapse MESH Headings Animals Anopheles/genetics Databases, Genetic Drosophila/genetics Genomics Software Collapse Grants Wellcome Trust G8225539 Medical Research Council 067205 Wellcome Trust Collapse
39	Prospero Acts as a Binary Switch between Self-Renewal and Differentiation in Drosophila Neural Stem Cells. Dev Cell 2006;11:775-89. [PMID: 17141154 DOI: 10.1016/j.devcel.2006.09.015] [Citation(s) in RCA: 313] [Impact Index Per Article: 17.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2006] [Revised: 07/26/2006] [Accepted: 09/19/2006] [Indexed: 12/23/2022] Abstract Stem cells have the remarkable ability to give rise to both self-renewing and differentiating daughter cells. Drosophila neural stem cells segregate cell-fate determinants from the self-renewing cell to the differentiating daughter at each division. Here, we show that one such determinant, the homeodomain transcription factor Prospero, regulates the choice between stem cell self-renewal and differentiation. We have identified the in vivo targets of Prospero throughout the entire genome. We show that Prospero represses genes required for self-renewal, such as stem cell fate genes and cell-cycle genes. Surprisingly, Prospero is also required to activate genes for terminal differentiation. We further show that in the absence of Prospero, differentiating daughters revert to a stem cell-like fate: they express markers of self-renewal, exhibit increased proliferation, and fail to differentiate. These results define a blueprint for the transition from stem cell self-renewal to terminal differentiation. Collapse Key Words Collapse MESH Headings Animals Animals, Genetically Modified Biomarkers/metabolism Cell Differentiation Cell Proliferation Drosophila Proteins/genetics Drosophila Proteins/metabolism Drosophila melanogaster/embryology Drosophila melanogaster/genetics Drosophila melanogaster/metabolism Gene Expression Profiling Genome Mutation Nerve Tissue Proteins/genetics Nerve Tissue Proteins/metabolism Neurons/cytology Neurons/metabolism Nuclear Proteins/genetics Nuclear Proteins/metabolism Oligonucleotide Array Sequence Analysis Stem Cells/cytology Stem Cells/metabolism Transcription Factors/genetics Transcription Factors/metabolism Collapse Grants G0300072 Medical Research Council Wellcome Trust Collapse
40	A friendly statistics package for microarray analysis. Bioinformatics 2005;21:4069-70. [PMID: 16188932 DOI: 10.1093/bioinformatics/bti663] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open Abstract SUMMARY The friendly statistics package for microarray analysis (FSPMA) is a tool that aims to fill the gap between simple to use and powerful analysis. FSPMA is a platform-independent R-package that allows efficient exploration of microarray data without the need for computer programming. Analysis is based on a mixed model ANOVA library (YASMA) that was extended to allow more flexible comparisons and other useful operations like k nearest neighbour imputing and spike-based normalization. Processing is controlled by a definition file that specifies all the steps necessary to derive analysis results from quantified microarray data. In addition to providing analysis without programming, the definition file also serves as exact documentation of all the analysis steps. AVAILABILITY The library is available under GPL 2 license and, together with additional information, provided at http://www.ccbi.cam.ac.uk/software/psyk/software.html#fspma Collapse Key Words Collapse MESH Headings Algorithms Data Interpretation, Statistical Gene Expression Profiling/methods Models, Genetic Models, Statistical Oligonucleotide Array Sequence Analysis/methods Software Collapse Grants Collapse
41	Regions of human chromosome 2 (2q32-q35) and mouse chromosome 1 show synteny with the pufferfish genome (Fugu rubripes). Genomics 1997;45:158-67. [PMID: 9339372 DOI: 10.1006/geno.1997.4913] [Citation(s) in RCA: 20] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Abstract We have isolated and sequenced a cosmid clone from the compact genome of the Japanese pufferfish (Fugu rubripes) containing portions of three genes that have the same order as in human. The gene order is microtubule-associated protein (MAP-2), myosin light chain (MYL-1), and carbamoyl phosphate synthetase (CPS III). The intron-exon organization of Fugu CPS III is identical with that of rat CPS I, although the equivalent genomic fragments of rat and Fugu CPS span 87.9 and 21 kb, respectively. This is the first report of a piscine CPS III genomic structure and predicts a close evolutionary link between CPS III and CPS I. The 8-kb intergenic region between MYL-1 and CPS gave no clear areas of transcription factor-binding sites by pairwise comparison with shark or rat CPS promoter regions. However, there was a match with the rat myosin light chain 2 (MLC-2) gene promoter and a MyoD transcription factor-binding site 874 bp upstream of the MYL-1 gene. Collapse Key Words Collapse MESH Headings Amino Acid Sequence Animals Carbon-Nitrogen Ligases/genetics Chromosomes, Human, Pair 2 Fishes, Poisonous/genetics Genome Humans Mice Microtubule-Associated Proteins/genetics Molecular Sequence Data Myosin Light Chains/genetics Rats Sequence Homology, Amino Acid Collapse Grants Wellcome Trust Collapse
42	Sequence comparison of human and yeast telomeres identifies structurally distinct subtelomeric domains. Hum Mol Genet 1997;6:1305-13. [PMID: 9259277 DOI: 10.1093/hmg/6.8.1305] [Citation(s) in RCA: 98] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open Abstract We have sequenced and compared DNA from the ends of three human chromosomes: 4p, 16p and 22q. In all cases the pro-terminal regions are subdivided by degenerate (TTAGGG)n repeats into distal and proximal sub-domains with entirely different patterns of homology to other chromosome ends. The distal regions contain numerous, short (<2 kb) segments of interrupted homology to many other human telomeric regions. The proximal regions show much longer (approximately 10-40 kb) uninterrupted homology to a few chromosome ends. A comparison of all yeast subtelomeric regions indicates that they too are subdivided by degenerate TTAGGG repeats into distal and proximal sub-domains with similarly different patterns of identity to other non-homologous chromosome ends. Sequence comparisons indicate that the distal and proximal sub-domains do not interact with each other and that they interact quite differently with the corresponding regions on other, non-homologous, chromosomes. These findings suggest that the degenerate TTAGGG repeats identify a previously unrecognized, evolutionarily conserved boundary between remarkably different subtelomeric domains. Collapse Key Words Collapse MESH Headings Base Sequence Chromosomes, Artificial, Yeast Humans Molecular Sequence Data Saccharomyces cerevisiae/genetics Telomere Collapse Grants Wellcome Trust Collapse
43	The relationship between chromosome structure and function at a human telomeric region. Nat Genet 1997;15:252-7. [PMID: 9054936 DOI: 10.1038/ng0397-252] [Citation(s) in RCA: 117] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023] Abstract We have sequenced a contiguous 284,495-bp segment of DNA extending from the terminal (TTAGGG)n repeats of the short arm of chromosome 16, providing a full description of the transition from telomeric through subtelomeric DNA to sequences that are unique to the chromosome. To complement and extend analysis of the primary sequence, we have characterized mRNA transcripts, patterns of DNA methylation and DNase I sensitivity. Together with previous data these studies describe in detail the structural and functional organization of a human telomeric region. Collapse Key Words Collapse MESH Headings Base Sequence Chromosome Mapping Chromosomes, Human, Pair 16 DNA/chemistry DNA/genetics Deoxyribonuclease I Dinucleotide Repeats Genetic Markers Humans Minisatellite Repeats Molecular Sequence Data Polymerase Chain Reaction RNA, Messenger/biosynthesis Repetitive Sequences, Nucleic Acid Telomere Transcription, Genetic Collapse Grants Wellcome Trust Collapse
44	The BRC repeats are conserved in mammalian BRCA2 proteins. Hum Mol Genet 1997;6:53-8. [PMID: 9002670 DOI: 10.1093/hmg/6.1.53] [Citation(s) in RCA: 128] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023] Open Abstract The breast cancer susceptibility gene BRCA2 encodes a protein of 3418 amino acids which does not exhibit substantial sequence similarity to any other protein in the public databases. A dot matrix comparison of BRCA2 with itself revealed an eight times repeated motif in the segment of the protein encoded by exon 11. As a preliminary test of the hypothesis that these motifs are functionally significant, we have sequenced exon 11 of BRCA2 in six mammals. An alignment of the predicted protein sequences shows that, overall, the motifs have been conserved while much of the intervening sequences has diverged. These data support the notion that the BRC motifs are important in BRCA2 function. There is, however, considerable interspecies variation within certain motif units, raising the possibility of redundancy and that not all of the repeats are required for the normal function of BRCA2. Collapse Key Words Collapse MESH Headings Amino Acid Sequence Animals BRCA2 Protein Base Sequence Conserved Sequence Cricetinae DNA Dogs Exons Haplorhini Humans Mammals Mice Molecular Sequence Data Neoplasm Proteins/genetics Repetitive Sequences, Nucleic Acid Sequence Homology, Amino Acid Swine Transcription Factors/genetics Collapse Grants Wellcome Trust Collapse
45	Molecular cloning of tissue-specific transcripts of a transketolase-related gene: implications for the evolution of new vertebrate genes. Genomics 1996;32:309-16. [PMID: 8838793 DOI: 10.1006/geno.1996.0124] [Citation(s) in RCA: 67] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023] Abstract As part of a systematic search for differentially expressed genes, we have isolated a novel transketolase-related gene (TKR) (HGMW-approved symbol TKT), located between the green color vision pigment gene (GCP) and the ABP-280 filamin gene (FLN1) in Xq28. Transcripts encoding tissue-specific protein isoforms could be isolated. Comparison with known transketolases (TK) demonstrated a TKR-specific deletion mutating one thiamine binding site. Genomic sequencing of the TKR gene revealed the presence of a pseudoexon as well as the acquisition of a tissue-specific spliced exon compared to TK. Since it has been postulated that the vertebrate genome arose by two cycles of tetraploidization from a cephalochordate genome, this could represent an example of the modulation of the function of a preexisting transketolase gene by gene duplication. Thiamine defiency is closely involved with two neurological disorders, Beriberi and Wernicke-Korsakoff syndromes, and in both of these conditions TK with altered activity are found. We discuss the possible involvement of TKR in explaining the observed variant transketolase forms. Collapse Key Words Collapse MESH Headings Alternative Splicing Amino Acid Sequence Animals Base Sequence Binding Sites Brain/embryology Brain Chemistry Chromosome Mapping Cloning, Molecular Evolution, Molecular Exons/genetics Fetal Heart/chemistry Genes/genetics Humans Molecular Sequence Data Organ Specificity RNA, Messenger/analysis RNA, Messenger/genetics Sequence Alignment Sequence Analysis, DNA Sequence Deletion Sequence Homology, Nucleic Acid Thiamine Transketolase/genetics Vertebrates/genetics Collapse Grants Wellcome Trust Collapse
46	Identification of the breast cancer susceptibility gene BRCA2. Nature 1995;378:789-92. [PMID: 8524414 DOI: 10.1038/378789a0] [Citation(s) in RCA: 2272] [Impact Index Per Article: 78.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023] Abstract In Western Europe and the United States approximately 1 in 12 women develop breast cancer. A small proportion of breast cancer cases, in particular those arising at a young age, are attributable to a highly penetrant, autosomal dominant predisposition to the disease. The breast cancer susceptibility gene, BRCA2, was recently localized to chromosome 13q12-q13. Here we report the identification of a gene in which we have detected six different germline mutations in breast cancer families that are likely to be due to BRCA2. Each mutation causes serious disruption to the open reading frame of the transcriptional unit. The results indicate that this is the BRCA2 gene. Collapse Key Words Collapse MESH Headings Amino Acid Sequence BRCA2 Protein Base Sequence Breast Neoplasms/genetics Breast Neoplasms, Male/genetics Chromosome Mapping Chromosomes, Artificial, Yeast Chromosomes, Human, Pair 13 DNA, Neoplasm Female Frameshift Mutation Genetic Predisposition to Disease Germ-Line Mutation Humans Male Molecular Sequence Data Neoplasm Proteins/genetics Open Reading Frames Sequence Deletion Transcription Factors/genetics Collapse Grants Wellcome Trust Collapse
47	Comparative sequence analysis of the human and pufferfish Huntington's disease genes. Nat Genet 1995;10:67-76. [PMID: 7647794 DOI: 10.1038/ng0595-67] [Citation(s) in RCA: 110] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023] Abstract The Huntington's disease (HD) gene encodes a novel protein with as yet no known function. In order to identify the functionally important domains of this protein, we have cloned and sequenced the homologue of the HD gene in the pufferfish, Fugu rubripes. The Fugu HD gene spans only 23 kb of genomic DNA, compared to the 170 kb human gene, and yet all 67 exons are conserved. The first coding exon, the site of the disease-causing triplet repeat, is highly conserved. However, the glutamine repeat in Fugu consists of just four residues. We also show that gene order may be conserved over longer stretches of the two genomes. Our work describes a detailed example of sequence comparison between human and Fugu, and illustrates the power of the pufferfish genome as a model system in the analysis of human genes. Collapse Key Words Collapse MESH Headings Amino Acid Sequence Animals Cloning, Molecular Codon/genetics Conserved Sequence DNA, Complementary Exons Fishes, Poisonous/genetics Humans Huntingtin Protein Huntington Disease/genetics Mice Molecular Sequence Data Nerve Tissue Proteins/genetics Nuclear Proteins/genetics Repetitive Sequences, Nucleic Acid Sequence Alignment Sequence Homology Collapse Grants Wellcome Trust Collapse
48	The sequence complexity of exons trapped from the mouse genome. Curr Biol 1994;4:983-9. [PMID: 7874497 DOI: 10.1016/s0960-9822(00)00222-0] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Abstract BACKGROUND A central issue in genome analysis is the identification and characterization of coding regions. Estimating the coding complexity of vertebrate genomes by measuring the kinetic complexity of mRNA populations and by sequence analysis of cDNAs is limited by the fact that any given source of mRNA represents a very biased sample of all genes. Exon trapping is a method that enables the identification of genes irrespective of their transcriptional status. RESULTS Exons were trapped from the entire mouse genome, and the resulting fragments cloned. About 7% of a random sample of exons taken from this library have significant structural homology or sequence similarity to previously sequenced genes. Using cDNAs derived from several stages of mouse development, evidence for expression of about 62% of this sample of exons was found. These data suggest that the great majority of 'exons' in the library are derived from genes. We estimate that the fraction of the genome contained in trapped exons is 2.4%; this corresponds to a sequence complexity of about 72 megabases. CONCLUSIONS The library of exons trapped from the entire mouse genome probably represents one of the least biased and most comprehensive libraries of mouse coding regions, and should therefore prove very useful for finding genes during genome mapping and sequencing. Collapse Key Words Collapse MESH Headings Amino Acid Sequence Animals Base Sequence DNA Primers/genetics DNA, Complementary/genetics Exons Gene Amplification Gene Library Genetic Techniques Genome Mice/genetics Molecular Sequence Data Polymerase Chain Reaction RNA, Messenger/genetics Sequence Homology, Nucleic Acid Collapse Grants Collapse
49	Dissecting the temporal requirements for homeotic gene function. Development 1994;120:1983-95. [PMID: 7925003 DOI: 10.1242/dev.120.7.1983] [Citation(s) in RCA: 101] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Abstract Homeotic genes confer identity to the different segments of Drosophila. These genes are expressed in many cell types over long periods of time. To determine when the homeotic genes are required for specific developmental events we have expressed the Ultrabithorax, abdominal-A and Abdominal-Bm proteins at different times during development using the GAL4 targeting technique. We find that early transient homeotic gene expression has no lasting effects on the differentiation of the larval epidermis, but it switches the fate of other cell types irreversibly (e.g. the spiracle primordia). We describe one cell type in the peripheral nervous system that makes sequential, independent responses to homeotic gene expression. We also provide evidence that supports the hypothesis of in vivo competition between the bithorax complex proteins for the regulation of their down-stream targets. Collapse Key Words Collapse MESH Headings Animals Cell Differentiation/genetics DNA-Binding Proteins/genetics Drosophila/embryology Drosophila/genetics Drosophila Proteins Gene Expression/physiology Genes, Homeobox/physiology Genes, Insect/physiology Homeodomain Proteins Insect Hormones/genetics Morphogenesis/genetics Nuclear Proteins Peripheral Nervous System/embryology Promoter Regions, Genetic Proteins/genetics Transcription Factors Collapse Grants Wellcome Trust Collapse
50	Structure and expression of the Huntington's disease gene: evidence against simple inactivation due to an expanded CAG repeat. SOMATIC CELL AND MOLECULAR GENETICS 1994;20:27-38. [PMID: 8197474 DOI: 10.1007/bf02257483] [Citation(s) in RCA: 183] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/29/2023] Abstract Huntington's disease, a neurodegenerative disorder characterized by loss of striatal neurons, is caused by an expanded, unstable trinucleotide repeat in a novel 4p16.3 gene. To lay the foundation for exploring the pathogenic mechanism in HD, we have determined the structure of the disease gene and examined its expression. The HD locus spans 180 kb and consists of 67 exons ranging in size from 48 bp to 341 bp with an average of 138 bp. Scanning of the HD transcript failed to reveal any additional sequence alterations characteristic of HD chromosomes. A codon loss polymorphism in linkage disequilibrium with the disorder revealed that both normal and HD alleles are represented in the mRNA population in HD heterozygotes, indicating that the defect does not eliminate transcription. The gene is ubiquitously expressed as two alternatively polyadenylated forms displaying different relative abundance in various fetal and adult tissues, suggesting the operation of interacting factors in determining specificity of cell loss. The HD gene was disrupted in a female carrying a balanced translocation with a breakpoint between exons 40 and 41. The absence of any abnormal phenotype in this individual argues against simple inactivation of the gene as the mechanism by which the expanded trinucleotide repeat causes HD. Taken together, these observations suggest that the dominant HD mutation either confers a new property on the mRNA or, more likely, alters an interaction at the protein level. Collapse Key Words Collapse MESH Headings Adult Alleles Base Sequence Cell Line Codon DNA, Complementary Exons Female Fetal Diseases/genetics Gene Expression Humans Huntington Disease/embryology Huntington Disease/genetics Introns Molecular Sequence Data Polymorphism, Genetic RNA, Messenger/metabolism Repetitive Sequences, Nucleic Acid Translocation, Genetic Collapse Grants NS16367 NINDS NIH HHS Wellcome Trust Collapse