1
|
Lin D, Xu W, Hong P, Wu C, Zhang Z, Zhang S, Xing L, Yang B, Zhou W, Xiao Q, Wang J, Wang C, He Y, Chen X, Cao X, Man J, Reheman A, Wu X, Hao X, Hu Z, Chen C, Cao Z, Yin R, Fu ZF, Zhou R, Teng Z, Li G, Cao G. Decoding the spatial chromatin organization and dynamic epigenetic landscapes of macrophage cells during differentiation and immune activation. Nat Commun 2022; 13:5857. [PMID: 36195603 PMCID: PMC9532393 DOI: 10.1038/s41467-022-33558-5] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2021] [Accepted: 09/22/2022] [Indexed: 11/09/2022] Open
Abstract
Immunocytes dynamically reprogram their gene expression profiles during differentiation and immunoresponse. However, the underlying mechanism remains elusive. Here, we develop a single-cell Hi-C method and systematically delineate the 3D genome and dynamic epigenetic atlas of macrophages during these processes. We propose "degree of disorder" to measure genome organizational patterns inside topologically-associated domains, which is correlated with the chromatin epigenetic states, gene expression, and chromatin structure variability in individual cells. Furthermore, we identify that NF-κB initiates systematic chromatin conformation reorganization upon Mycobacterium tuberculosis infection. The integrated Hi-C, eQTL, and GWAS analysis depicts the atlas of the long-range target genes of mycobacterial disease susceptible loci. Among these, the SNP rs1873613 is located in the anchor of a dynamic chromatin loop with LRRK2, whose inhibitor AdoCbl could be an anti-tuberculosis drug candidate. Our study provides comprehensive resources for the 3D genome structure of immunocytes and sheds insights into the order of genome organization and the coordinated gene transcription during immunoresponse.
Collapse
Affiliation(s)
- Da Lin
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan, China
- College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, China
- College of Bio-Medicine and Health, Huazhong Agricultural University, Wuhan, China
| | - Weize Xu
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan, China
- College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, China
| | - Ping Hong
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, China
- Agricultural Bioinformatics Key Laboratory of Hubei Province, Hubei Engineering Technology Research Center of Agricultural Big Data, 3D Genomics Research Center, Huazhong Agricultural University, Wuhan, China
- College of Informatics, Huazhong Agricultural University, Wuhan, China
| | - Chengchao Wu
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan, China
- College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, China
| | - Zhihui Zhang
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan, China
- College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, China
| | - Siheng Zhang
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan, China
- College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, China
| | - Lingyu Xing
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan, China
- College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, China
| | - Bing Yang
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan, China
- College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, China
| | - Wei Zhou
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan, China
- College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, China
| | - Qin Xiao
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan, China
- College of Bio-Medicine and Health, Huazhong Agricultural University, Wuhan, China
| | - Jinyue Wang
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan, China
- College of Bio-Medicine and Health, Huazhong Agricultural University, Wuhan, China
| | - Cong Wang
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan, China
- College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, China
| | - Yu He
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan, China
- College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, China
| | - Xi Chen
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan, China
- College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, China
| | - Xiaojian Cao
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan, China
- College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, China
| | - Jiangwei Man
- College of Informatics, Huazhong Agricultural University, Wuhan, China
| | - Aikebaier Reheman
- College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, China
- College of Animal Science and Technology, Tarim University, Alar, China
| | - Xiaofeng Wu
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan, China
- College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, China
| | - Xingjie Hao
- School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Zhe Hu
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan, China
| | - Chunli Chen
- College of Life Science and Technology, Huazhong Agricultural University, Wuhan, China
- Key Laboratory of Plant Resource Conservation and Germplasm Innovation in Mountainous Region, Guizhou University, Guiyang, China
| | - Zimeng Cao
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan, China
- College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, China
- College of Bio-Medicine and Health, Huazhong Agricultural University, Wuhan, China
- College of Animal Sciences, Yangtze River University, Jingzhou, China
| | - Rong Yin
- Department of Hematology, Zhongnan Hospital of Wuhan University, Wuhan, China
| | - Zhen F Fu
- Department of Pathology, College of Veterinary Medicine, University of Georgia, Athens, GA, USA
| | - Rong Zhou
- Dapartment of Reproductive Medicine Center, Zhongnan Hospital of Wuhan University, Wuhan, China
| | - Zhaowei Teng
- The First People's Hospital of Yunnan Province, Affiliated Hospital of Kunming University of Science and Technology, Kunming, China
| | - Guoliang Li
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, China.
- Agricultural Bioinformatics Key Laboratory of Hubei Province, Hubei Engineering Technology Research Center of Agricultural Big Data, 3D Genomics Research Center, Huazhong Agricultural University, Wuhan, China.
- College of Informatics, Huazhong Agricultural University, Wuhan, China.
| | - Gang Cao
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan, China.
- College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, China.
- College of Bio-Medicine and Health, Huazhong Agricultural University, Wuhan, China.
| |
Collapse
|
2
|
Antonarakis SE. History of the methodology of disease gene identification. Am J Med Genet A 2021; 185:3266-3275. [PMID: 34159713 PMCID: PMC8596769 DOI: 10.1002/ajmg.a.62400] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2021] [Revised: 06/10/2021] [Accepted: 06/11/2021] [Indexed: 11/06/2022]
Abstract
The past 45 years have witnessed a triumph in the discovery of genes and genetic variation that cause Mendelian disorders due to high impact variants. Important discoveries and organized projects have provided the necessary tools and infrastructure for the identification of gene defects leading to thousands of monogenic phenotypes. This endeavor can be divided in three phases in which different laboratory strategies were employed for the discovery of disease-related genes: (i) the biochemical phase, (ii) the genetic linkage followed by positional cloning phase, and (iii) the sequence identification phase. However, much more work is needed to identify all the high impact genomic variation that substantially contributes to the phenotypic variation.
Collapse
Affiliation(s)
- Stylianos E Antonarakis
- University of Geneva Medical School, Geneva, Switzerland.,Medigenome, Swiss Institute of Genomic Medicine, Geneva, Switzerland
| |
Collapse
|
3
|
Clarke ZA, Andrews TS, Atif J, Pouyabahar D, Innes BT, MacParland SA, Bader GD. Tutorial: guidelines for annotating single-cell transcriptomic maps using automated and manual methods. Nat Protoc 2021; 16:2749-2764. [PMID: 34031612 DOI: 10.1038/s41596-021-00534-0] [Citation(s) in RCA: 112] [Impact Index Per Article: 28.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2020] [Accepted: 03/12/2021] [Indexed: 11/09/2022]
Abstract
Single-cell transcriptomics can profile thousands of cells in a single experiment and identify novel cell types, states and dynamics in a wide variety of tissues and organisms. Standard experimental protocols and analysis workflows have been developed to create single-cell transcriptomic maps from tissues. This tutorial focuses on how to interpret these data to identify cell types, states and other biologically relevant patterns with the objective of creating an annotated map of cells. We recommend a three-step workflow including automatic cell annotation (wherever possible), manual cell annotation and verification. Frequently encountered challenges are discussed, as well as strategies to address them. Guiding principles and specific recommendations for software tools and resources that can be used for each step are covered, and an R notebook is included to help run the recommended workflow. Basic familiarity with computer software is assumed, and basic knowledge of programming (e.g., in the R language) is recommended.
Collapse
Affiliation(s)
- Zoe A Clarke
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada.,The Donnelly Centre, University of Toronto, Toronto, Ontario, Canada
| | - Tallulah S Andrews
- The Donnelly Centre, University of Toronto, Toronto, Ontario, Canada.,Ajmera Transplant Centre, Toronto General Hospital Research Institute, Toronto, Ontario, Canada.,Department of Immunology, University of Toronto, Toronto, Ontario, Canada
| | - Jawairia Atif
- Ajmera Transplant Centre, Toronto General Hospital Research Institute, Toronto, Ontario, Canada.,Department of Immunology, University of Toronto, Toronto, Ontario, Canada
| | - Delaram Pouyabahar
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada.,The Donnelly Centre, University of Toronto, Toronto, Ontario, Canada
| | - Brendan T Innes
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada.,The Donnelly Centre, University of Toronto, Toronto, Ontario, Canada
| | - Sonya A MacParland
- Ajmera Transplant Centre, Toronto General Hospital Research Institute, Toronto, Ontario, Canada. .,Department of Immunology, University of Toronto, Toronto, Ontario, Canada. .,Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, Ontario, Canada.
| | - Gary D Bader
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada. .,The Donnelly Centre, University of Toronto, Toronto, Ontario, Canada. .,Department of Computer Science, University of Toronto, Toronto, Ontario, Canada. .,Lunenfeld-Tanenbaum Research Institute, Toronto, Ontario, Canada.
| |
Collapse
|
4
|
Cai M, Hao Nguyen C, Mamitsuka H, Li L. XGSEA: CROSS-species gene set enrichment analysis via domain adaptation. Brief Bioinform 2021; 22:6120324. [PMID: 33515011 DOI: 10.1093/bib/bbaa406] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2020] [Revised: 12/12/2020] [Indexed: 01/02/2023] Open
Abstract
MOTIVATION Gene set enrichment analysis (GSEA) has been widely used to identify gene sets with statistically significant difference between cases and controls against a large gene set. GSEA needs both phenotype labels and expression of genes. However, gene expression are assessed more often for model organisms than minor species. Also, importantly gene expression are not measured well under specific conditions for human, due to high risk of direct experiments, such as non-approved treatment or gene knockout, and then often substituted by mouse. Thus, predicting enrichment significance (on a phenotype) of a given gene set of a species (target, say human), by using gene expression measured under the same phenotype of the other species (source, say mouse) is a vital and challenging problem, which we call CROSS-species gene set enrichment problem (XGSEP). RESULTS For XGSEP, we propose the CROSS-species gene set enrichment analysis (XGSEA), with three steps of: (1) running GSEA for a source species to obtain enrichment scores and $p$-values of source gene sets; (2) representing the relation between source and target gene sets by domain adaptation; and (3) using regression to predict $p$-values of target gene sets, based on the representation in (2). We extensively validated the XGSEA by using five regression and one classification measurements on four real data sets under various settings, proving that the XGSEA significantly outperformed three baseline methods in most cases. A case study of identifying important human pathways for T -cell dysfunction and reprogramming from mouse ATAC-Seq data further confirmed the reliability of the XGSEA. AVAILABILITY Source code of the XGSEA is available through https://github.com/LiminLi-xjtu/XGSEA.
Collapse
Affiliation(s)
- Menglan Cai
- School of Mathematics and Statistics, Xi'an Jiaotong University, Xi'an, 710049, China
| | - Canh Hao Nguyen
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, 6110011, Japan
| | - Hiroshi Mamitsuka
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, 6110011, Japan.,Department of Computer Science, Aalto Unviersity, Espoo, Finland
| | - Limin Li
- School of Mathematics and Statistics, Xi'an Jiaotong University, Xi'an, 710049, China
| |
Collapse
|
5
|
Thanki AS, Soranzo N, Herrero J, Haerty W, Davey RP. Aequatus: an open-source homology browser. Gigascience 2018; 7:5160135. [PMID: 30395211 PMCID: PMC6251984 DOI: 10.1093/gigascience/giy128] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2018] [Revised: 09/06/2018] [Accepted: 10/17/2018] [Indexed: 11/18/2022] Open
Abstract
Background Phylogenetic information inferred from the study of homologous genes helps us to understand the evolution of genes and gene families, including the identification of ancestral gene duplication events as well as regions under positive or purifying selection within lineages. Gene family and orthogroup characterization enables the identification of syntenic blocks, which can then be visualized with various tools. Unfortunately, currently available tools display only an overview of syntenic regions as a whole, limited to the gene level, and none provide further details about structural changes within genes, such as the conservation of ancestral exon boundaries amongst multiple genomes. Findings We present Aequatus, an open-source web-based tool that provides an in-depth view of gene structure across gene families, with various options to render and filter visualizations. It relies on precalculated alignment and gene feature information typically held in, but not limited to, the Ensembl Compara and Core databases. We also offer Aequatus.js, a reusable JavaScript module that fulfills the visualization aspects of Aequatus, available within the Galaxy web platform as a visualization plug-in, which can be used to visualize gene trees generated by the GeneSeqToFamily workflow.
Collapse
Affiliation(s)
- Anil S Thanki
- Earlham Institute, Norwich Research Park, Norwich, NR4 7UZ, UK
| | - Nicola Soranzo
- Earlham Institute, Norwich Research Park, Norwich, NR4 7UZ, UK
| | - Javier Herrero
- Earlham Institute, Norwich Research Park, Norwich, NR4 7UZ, UK
- Bill Lyons Informatics Centre, UCL Cancer Institute, 72 Huntley St., London, WC1E 6DD, UK
| | - Wilfried Haerty
- Earlham Institute, Norwich Research Park, Norwich, NR4 7UZ, UK
| | - Robert P Davey
- Earlham Institute, Norwich Research Park, Norwich, NR4 7UZ, UK
| |
Collapse
|
6
|
Bryan C, Guterman G, Ma KL, Lewin H, Larkin D, Kim J, Ma J, Farre M. Synteny Explorer: An Interactive Visualization Application for Teaching Genome Evolution. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2017; 23:711-720. [PMID: 27845661 PMCID: PMC6599602 DOI: 10.1109/tvcg.2016.2598789] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2023]
Abstract
Rapid advances in biology demand new tools for more active research dissemination and engaged teaching. This paper presents Synteny Explorer, an interactive visualization application designed to let college students explore genome evolution of mammalian species. The tool visualizes synteny blocks: segments of homologous DNA shared between various extant species that can be traced back or reconstructed in extinct, ancestral species. We take a karyogram-based approach to create an interactive synteny visualization, leading to a more appealing and engaging design for undergraduate-level genome evolution education. For validation, we conduct three user studies: two focused studies on color and animation design choices and a larger study that performs overall system usability testing while comparing our karyogram-based designs with two more common genome mapping representations in an educational context. While existing views communicate the same information, study participants found the interactive, karyogram-based views much easier and likable to use. We additionally discuss feedback from biology and genomics faculty, who judge Synteny Explorer's fitness for use in classrooms.
Collapse
|
7
|
Lee J, Hong WY, Cho M, Sim M, Lee D, Ko Y, Kim J. Synteny Portal: a web-based application portal for synteny block analysis. Nucleic Acids Res 2016; 44:W35-40. [PMID: 27154270 PMCID: PMC4987893 DOI: 10.1093/nar/gkw310] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2016] [Accepted: 04/12/2016] [Indexed: 11/12/2022] Open
Abstract
Recent advances in next-generation sequencing technologies and genome assembly algorithms have enabled the accumulation of a huge volume of genome sequences from various species. This has provided new opportunities for large-scale comparative genomics studies. Identifying and utilizing synteny blocks, which are genomic regions conserved among multiple species, is key to understanding genomic architecture and the evolutionary history of genomes. However, the construction and visualization of such synteny blocks from multiple species are very challenging, especially for biologists with a lack of computational skills. Here, we present Synteny Portal, a versatile web-based application portal for constructing, visualizing and browsing synteny blocks. With Synteny Portal, users can easily (i) construct synteny blocks among multiple species by using prebuilt alignments in the UCSC genome browser database, (ii) visualize and download syntenic relationships as high-quality images, (iii) browse synteny blocks with genetic information and (iv) download the details of synteny blocks to be used as input for downstream synteny-based analyses, all in an intuitive and easy-to-use web-based interface. We believe that Synteny Portal will serve as a highly valuable tool that will enable biologists to easily perform comparative genomics studies by compensating limitations of existing tools. Synteny Portal is freely available at http://bioinfo.konkuk.ac.kr/synteny_portal.
Collapse
Affiliation(s)
- Jongin Lee
- Department of Animal Biotechnology, Konkuk University, Seoul 05029, South Korea
| | - Woon-Young Hong
- Department of Animal Biotechnology, Konkuk University, Seoul 05029, South Korea
| | - Minah Cho
- Department of Animal Biotechnology, Konkuk University, Seoul 05029, South Korea
| | - Mikang Sim
- Department of Animal Biotechnology, Konkuk University, Seoul 05029, South Korea
| | - Daehwan Lee
- Department of Animal Biotechnology, Konkuk University, Seoul 05029, South Korea
| | - Younhee Ko
- Department of Clinical Genetics, Department of Pediatrics, Yonsei University College of Medicine, Seoul 03722, South Korea
| | - Jaebum Kim
- Department of Animal Biotechnology, Konkuk University, Seoul 05029, South Korea
| |
Collapse
|
8
|
Arendt ML, Melin M, Tonomura N, Koltookian M, Courtay-Cahen C, Flindall N, Bass J, Boerkamp K, Megquir K, Youell L, Murphy S, McCarthy C, London C, Rutteman GR, Starkey M, Lindblad-Toh K. Genome-Wide Association Study of Golden Retrievers Identifies Germ-Line Risk Factors Predisposing to Mast Cell Tumours. PLoS Genet 2015; 11:e1005647. [PMID: 26588071 PMCID: PMC4654484 DOI: 10.1371/journal.pgen.1005647] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2015] [Accepted: 10/14/2015] [Indexed: 02/07/2023] Open
Abstract
Canine mast cell tumours (CMCT) are one of the most common skin tumours in dogs with a major impact on canine health. Certain breeds have a higher risk of developing mast cell tumours, suggesting that underlying predisposing germ-line genetic factors play a role in the development of this disease. The genetic risk factors are largely unknown, although somatic mutations in the oncogene C-KIT have been detected in a proportion of CMCT, making CMCT a comparative model for mastocytosis in humans where C-KIT mutations are frequent. We have performed a genome wide association study in golden retrievers from two continents and identified separate regions in the genome associated with risk of CMCT in the two populations. Sequence capture of associated regions and subsequent fine mapping in a larger cohort of dogs identified a SNP associated with development of CMCT in the GNAI2 gene (p = 2.2x10-16), introducing an alternative splice form of this gene resulting in a truncated protein. In addition, disease associated haplotypes harbouring the hyaluronidase genes HYAL1, HYAL2 and HYAL3 on cfa20 and HYAL4, SPAM1 and HYALP1 on cfa14 were identified as separate risk factors in European and US golden retrievers, respectively, suggesting that turnover of hyaluronan plays an important role in the development of CMCT.
Collapse
Affiliation(s)
- Maja L. Arendt
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
- Department of Veterinary Medicine, University of Cambridge, Cambridge, United Kingdom
- * E-mail: (MLA); (KLT)
| | - Malin Melin
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Noriko Tonomura
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
- Department of Clinical Sciences, Cummings School of Veterinary Medicine at Tufts University, North Grafton, Massachusetts, United States of America
| | - Michele Koltookian
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | | | | | - Joyce Bass
- Animal Health Trust, Newmarket, United Kingdom
| | - Kim Boerkamp
- Department of Clinical Sciences of Companion Animals, Utrecht University, Utrecht, The Netherlands
| | - Katherine Megquir
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
- Department of Clinical Sciences, Cummings School of Veterinary Medicine at Tufts University, North Grafton, Massachusetts, United States of America
| | - Lisa Youell
- Animal Health Trust, Newmarket, United Kingdom
| | - Sue Murphy
- Animal Health Trust, Newmarket, United Kingdom
| | - Colleen McCarthy
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Cheryl London
- Department of Veterinary Clinical Sciences Ohio State University, Columbus, Ohio, United States of America
| | - Gerard R. Rutteman
- Department of Clinical Sciences of Companion Animals, Utrecht University, Utrecht, The Netherlands
- Veterinary Specialist Center De Wagenrenk, Wageningen, The Netherlands
| | | | - Kerstin Lindblad-Toh
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
- * E-mail: (MLA); (KLT)
| |
Collapse
|
9
|
Hafeez M, Shabbir M, Altaf F, Abbasi AA. Phylogenomic analysis reveals ancient segmental duplications in the human genome. Mol Phylogenet Evol 2015; 94:95-100. [PMID: 26327327 DOI: 10.1016/j.ympev.2015.08.019] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2015] [Revised: 08/04/2015] [Accepted: 08/21/2015] [Indexed: 01/16/2023]
Abstract
Evolution of organismal complexity and origin of novelties during vertebrate history has been widely explored in context of both regulation of gene expression and gene duplication events. Ohno (1970) for the first time put forward the idea of two rounds whole genome duplication events as the most plausible explanation for evolutionarizing the vertebrate lineage (2R hypothesis). To test the validity of 2R hypothesis, a robust phylogenomic analysis of multigene families with triplicated or quadruplicated representation on human FGFR bearing chromosomes (4/5/8/10) was performed. Topology comparison approach categorized members of 80 families into five distinct co-duplicated groups. Genes belonging to one co-duplicated group are duplicated concurrently, whereas genes of two different co-duplicated groups do not share their duplication history and have not duplicated in congruency. Our findings contradict the 2R model and are indicative of small-scale duplications and rearrangements that cover the entire span of animal's history.
Collapse
Affiliation(s)
- Madiha Hafeez
- National Center for Bioinformatics, Program of Comparative and Evolutionary Genomics, Faculty of Biological Sciences, Quaid-i-Azam University, Islamabad 45320, Pakistan
| | - Madiha Shabbir
- National Center for Bioinformatics, Program of Comparative and Evolutionary Genomics, Faculty of Biological Sciences, Quaid-i-Azam University, Islamabad 45320, Pakistan
| | - Fouzia Altaf
- National Center for Bioinformatics, Program of Comparative and Evolutionary Genomics, Faculty of Biological Sciences, Quaid-i-Azam University, Islamabad 45320, Pakistan
| | - Amir Ali Abbasi
- National Center for Bioinformatics, Program of Comparative and Evolutionary Genomics, Faculty of Biological Sciences, Quaid-i-Azam University, Islamabad 45320, Pakistan.
| |
Collapse
|
10
|
Chelaru F, Corrada Bravo H. Epiviz: a view inside the design of an integrated visual analysis software for genomics. BMC Bioinformatics 2015; 16 Suppl 11:S4. [PMID: 26328750 PMCID: PMC4559604 DOI: 10.1186/1471-2105-16-s11-s4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
Background Computational and visual data analysis for genomics has traditionally involved a combination of tools and resources, of which the most ubiquitous consist of genome browsers, focused mainly on integrative visualization of large numbers of big datasets, and computational environments, focused on data modeling of a small number of moderately sized datasets. Workflows that involve the integration and exploration of multiple heterogeneous data sources, small and large, public and user specific have been poorly addressed by these tools. In our previous work, we introduced Epiviz, which bridges the gap between the two types of tools, simplifying these workflows. Results In this paper we expand on the design decisions behind Epiviz, and introduce a series of new advanced features that further support the type of interactive exploratory workflow we have targeted. We discuss three ways in which Epiviz advances the field of genomic data analysis: 1) it brings code to interactive visualizations at various different levels; 2) takes the first steps in the direction of collaborative data analysis by incorporating user plugins from source control providers, as well as by allowing analysis states to be shared among the scientific community; 3) combines established analysis features that have never before been available simultaneously in a genome browser. In our discussion section, we present security implications of the current design, as well as a series of limitations and future research steps. Conclusions Since many of the design choices of Epiviz are novel in genomics data analysis, this paper serves both as a document of our own approaches with lessons learned, as well as a start point for future efforts in the same direction for the genomics community.
Collapse
|
11
|
Yu B, Doraiswamy H, Chen X, Miraldi E, Arrieta-Ortiz ML, Hafemeister C, Madar A, Bonneau R, Silva CT. Genotet: An Interactive Web-based Visual Exploration Framework to Support Validation of Gene Regulatory Networks. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2014; 20:1903-1912. [PMID: 26356904 DOI: 10.1109/tvcg.2014.2346753] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Elucidation of transcriptional regulatory networks (TRNs) is a fundamental goal in biology, and one of the most important components of TRNs are transcription factors (TFs), proteins that specifically bind to gene promoter and enhancer regions to alter target gene expression patterns. Advances in genomic technologies as well as advances in computational biology have led to multiple large regulatory network models (directed networks) each with a large corpus of supporting data and gene-annotation. There are multiple possible biological motivations for exploring large regulatory network models, including: validating TF-target gene relationships, figuring out co-regulation patterns, and exploring the coordination of cell processes in response to changes in cell state or environment. Here we focus on queries aimed at validating regulatory network models, and on coordinating visualization of primary data and directed weighted gene regulatory networks. The large size of both the network models and the primary data can make such coordinated queries cumbersome with existing tools and, in particular, inhibits the sharing of results between collaborators. In this work, we develop and demonstrate a web-based framework for coordinating visualization and exploration of expression data (RNA-seq, microarray), network models and gene-binding data (ChIP-seq). Using specialized data structures and multiple coordinated views, we design an efficient querying model to support interactive analysis of the data. Finally, we show the effectiveness of our framework through case studies for the mouse immune system (a dataset focused on a subset of key cellular functions) and a model bacteria (a small genome with high data-completeness).
Collapse
|
12
|
Ambreen S, Khalil F, Abbasi AA. Integrating large-scale phylogenetic datasets to dissect the ancient evolutionary history of vertebrate genome. Mol Phylogenet Evol 2014; 78:1-13. [PMID: 24821622 DOI: 10.1016/j.ympev.2014.05.002] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2014] [Revised: 04/17/2014] [Accepted: 05/01/2014] [Indexed: 11/18/2022]
Abstract
BACKGROUND The vertebrate genome often contains closely spaced set of paralogous genes from distinct gene families on typically two, three or four different chromosomes (paralogons). This type of genome architecture is widely considered to be remnants of whole genome duplication events (WGD/2R). RESULTS Taking advantage of the well-annotated and high-quality human genomic sequence map as well as the ever-increasing accessibility of large-scale genomic sequence data from a diverse range of animal species, we investigated the evolutionary history of potential quadruplicated regions residing on human HOX-cluster bearing chromosomes (chromosomes 2/7/12/17). For this purpose a detailed phylogenetic analysis was performed for those multigene families, including members of at least three of the four HOX-bearing chromosomes. Topology comparison approach categorized the members of 63 families into distinct co-duplicated groups. Distinct gene families belonging to a particular co-duplicated group, exhibit similar evolutionary history and hence have duplicated concurrently, whereas genes of two different co-duplicated groups do not share their history and have not duplicated in concert with each other. CONCLUSIONS These results based on large-scale phylogenetic dataset yielded no evidence in favor of polyploidization events; instead it appears that triplicated and quadruplicated genomic segments on the human HOX-bearing chromosomes arose by small-scale duplication events that occurred at widely different time points in animal evolution.
Collapse
Affiliation(s)
- Sadaf Ambreen
- National Center for Bioinformatics, Program of Comparative and Evolutionary Genomics, Faculty of Biological Sciences, Quaid-i-Azam University, Islamabad 45320, Pakistan
| | - Faiqa Khalil
- National Center for Bioinformatics, Program of Comparative and Evolutionary Genomics, Faculty of Biological Sciences, Quaid-i-Azam University, Islamabad 45320, Pakistan
| | - Amir Ali Abbasi
- National Center for Bioinformatics, Program of Comparative and Evolutionary Genomics, Faculty of Biological Sciences, Quaid-i-Azam University, Islamabad 45320, Pakistan.
| |
Collapse
|
13
|
Deakin JE, Delbridge ML, Koina E, Harley N, Alsop AE, Wang C, Patel VS, Graves JAM. Reconstruction of the ancestral marsupial karyotype from comparative gene maps. BMC Evol Biol 2013; 13:258. [PMID: 24261750 PMCID: PMC4222502 DOI: 10.1186/1471-2148-13-258] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2013] [Accepted: 11/19/2013] [Indexed: 11/03/2022] Open
Abstract
BACKGROUND The increasing number of assembled mammalian genomes makes it possible to compare genome organisation across mammalian lineages and reconstruct chromosomes of the ancestral marsupial and therian (marsupial and eutherian) mammals. However, the reconstruction of ancestral genomes requires genome assemblies to be anchored to chromosomes. The recently sequenced tammar wallaby (Macropus eugenii) genome was assembled into over 300,000 contigs. We previously devised an efficient strategy for mapping large evolutionarily conserved blocks in non-model mammals, and applied this to determine the arrangement of conserved blocks on all wallaby chromosomes, thereby permitting comparative maps to be constructed and resolve the long debated issue between a 2n = 14 and 2n = 22 ancestral marsupial karyotype. RESULTS We identified large blocks of genes conserved between human and opossum, and mapped genes corresponding to the ends of these blocks by fluorescence in situ hybridization (FISH). A total of 242 genes was assigned to wallaby chromosomes in the present study, bringing the total number of genes mapped to 554 and making it the most densely cytogenetically mapped marsupial genome. We used these gene assignments to construct comparative maps between wallaby and opossum, which uncovered many intrachromosomal rearrangements, particularly for genes found on wallaby chromosomes X and 3. Expanding comparisons to include chicken and human permitted the putative ancestral marsupial (2n = 14) and therian mammal (2n = 19) karyotypes to be reconstructed. CONCLUSIONS Our physical mapping data for the tammar wallaby has uncovered the events shaping marsupial genomes and enabled us to predict the ancestral marsupial karyotype, supporting a 2n = 14 ancestor. Futhermore, our predicted therian ancestral karyotype has helped to understand the evolution of the ancestral eutherian genome.
Collapse
Affiliation(s)
- Janine E Deakin
- ARC Centre of Excellence for Kangaroo Genomics, Canberra, Australia.
| | | | | | | | | | | | | | | |
Collapse
|
14
|
Sahadevan S, Hofmann-Apitius M, Schellander K, Tesfaye D, Fluck J, Friedrich CM. Text mining in livestock animal science: introducing the potential of text mining to animal sciences. J Anim Sci 2012; 90:3666-76. [PMID: 22665627 DOI: 10.2527/jas.2011-4841] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
In biological research, establishing the prior art by searching and collecting information already present in the domain has equal importance as the experiments done. To obtain a complete overview about the relevant knowledge, researchers mainly rely on 2 major information sources: i) various biological databases and ii) scientific publications in the field. The major difference between the 2 information sources is that information from databases is available, typically well structured and condensed. The information content in scientific literature is vastly unstructured; that is, dispersed among the many different sections of scientific text. The traditional method of information extraction from scientific literature occurs by generating a list of relevant publications in the field of interest and manually scanning these texts for relevant information, which is very time consuming. It is more than likely that in using this "classical" approach the researcher misses some relevant information mentioned in the literature or has to go through biological databases to extract further information. Text mining and named entity recognition methods have already been used in human genomics and related fields as a solution to this problem. These methods can process and extract information from large volumes of scientific text. Text mining is defined as the automatic extraction of previously unknown and potentially useful information from text. Named entity recognition (NER) is defined as the method of identifying named entities (names of real world objects; for example, gene/protein names, drugs, enzymes) in text. In animal sciences, text mining and related methods have been briefly used in murine genomics and associated fields, leaving behind other fields of animal sciences, such as livestock genomics. The aim of this work was to develop an information retrieval platform in the livestock domain focusing on livestock publications and the recognition of relevant data from cattle and pigs. For this purpose, the rather noncomprehensive resources of pig and cattle gene and protein terminologies were enriched with orthologue synonyms, integrated in the NER platform, ProMiner, which is successfully used in human genomics domain. Based on the performance tests done, the present system achieved a fair performance with precision 0.64, recall 0.74, and F(1) measure of 0.69 in a test scenario based on cattle literature.
Collapse
Affiliation(s)
- S Sahadevan
- Bonn Aachen International Centre for Information Technology, Dahlmannstrasse 2, 53113 Bonn, Germany
| | | | | | | | | | | |
Collapse
|
15
|
Wang Y, Gao Y, Imsland F, Gu X, Feng C, Liu R, Song C, Tixier-Boichard M, Gourichon D, Li Q, Chen K, Li H, Andersson L, Hu X, Li N. The crest phenotype in chicken is associated with ectopic expression of HOXC8 in cranial skin. PLoS One 2012; 7:e34012. [PMID: 22514613 PMCID: PMC3326004 DOI: 10.1371/journal.pone.0034012] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2011] [Accepted: 02/20/2012] [Indexed: 11/18/2022] Open
Abstract
The Crest phenotype is characterised by a tuft of elongated feathers atop the head. A similar phenotype is also seen in several wild bird species. Crest shows an autosomal incompletely dominant mode of inheritance and is associated with cerebral hernia. Here we show, using linkage analysis and genome-wide association, that Crest is located on the E22C19W28 linkage group and that it shows complete association to the HOXC-cluster on this chromosome. Expression analysis of tissues from Crested and non-crested chickens, representing 26 different breeds, revealed that HOXC8, but not HOXC12 or HOXC13, showed ectopic expression in cranial skin during embryonic development. We propose that Crest is caused by a cis-acting regulatory mutation underlying the ectopic expression of HOXC8. However, the identification of the causative mutation(s) has to await until a method becomes available for assembling this chromosomal region. Crest is unfortunately located in a genomic region that has so far defied all attempts to establish a contiguous sequence.
Collapse
Affiliation(s)
- Yanqiang Wang
- State Key Laboratory for Agrobiotechnology, China Agricultural University, Beijing, China
| | - Yu Gao
- State Key Laboratory for Agrobiotechnology, China Agricultural University, Beijing, China
| | - Freyja Imsland
- Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Xiaorong Gu
- State Key Laboratory for Agrobiotechnology, China Agricultural University, Beijing, China
| | - Chungang Feng
- State Key Laboratory for Agrobiotechnology, China Agricultural University, Beijing, China
| | - Ranran Liu
- State Key Laboratory for Agrobiotechnology, China Agricultural University, Beijing, China
| | - Chi Song
- State Key Laboratory for Agrobiotechnology, China Agricultural University, Beijing, China
- Jiangsu lnstitute of Poultry Science, Yangzhou, China
| | | | | | - Qingyuan Li
- State Key Laboratory for Agrobiotechnology, China Agricultural University, Beijing, China
| | - Kuanwei Chen
- Jiangsu lnstitute of Poultry Science, Yangzhou, China
| | - Huifang Li
- Jiangsu lnstitute of Poultry Science, Yangzhou, China
| | - Leif Andersson
- Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
- Department of Animal Breeding and Genetics, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Xiaoxiang Hu
- State Key Laboratory for Agrobiotechnology, China Agricultural University, Beijing, China
- * E-mail: ;
| | - Ning Li
- State Key Laboratory for Agrobiotechnology, China Agricultural University, Beijing, China
- * E-mail: ;
| |
Collapse
|
16
|
Lisacek F, Chichester C, Gonnet P, Jaillet O, Kappus S, Nikitin F, Roland P, Rossier G, Truong L, Appel R. Shaping biological knowledge: applications in proteomics. Comp Funct Genomics 2011; 5:190-5. [PMID: 18629073 PMCID: PMC2447358 DOI: 10.1002/cfg.379] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2003] [Revised: 12/12/2003] [Accepted: 12/18/2003] [Indexed: 12/01/2022] Open
Abstract
The central dogma of molecular biology has provided a meaningful principle
for data integration in the field of genomics. In this context, integration reflects
the known transitions from a chromosome to a protein sequence: transcription,
intron splicing, exon assembly and translation. There is no such clear principle for
integrating proteomics data, since the laws governing protein folding and interactivity
are not quite understood. In our effort to bring together independent pieces of
information relative to proteins in a biologically meaningful way, we assess the bias of
bioinformatics resources and consequent approximations in the framework of small-scale
studies. We analyse proteomics data while following both a data-driven (focus
on proteins smaller than 10 kDa) and a hypothesis-driven (focus on whole bacterial
proteomes) approach. These applications are potentially the source of specialized
complements to classical biological ontologies.
Collapse
Affiliation(s)
- F Lisacek
- R&D GeneBio, 25 Avenue de Champel, Geneva 1206, Switzerland.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
17
|
Dewey CN. Positional orthology: putting genomic evolutionary relationships into context. Brief Bioinform 2011; 12:401-12. [PMID: 21705766 PMCID: PMC3178058 DOI: 10.1093/bib/bbr040] [Citation(s) in RCA: 66] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
Orthology is a powerful refinement of homology that allows us to describe more precisely the evolution of genomes and understand the function of the genes they contain. However, because orthology is not concerned with genomic position, it is limited in its ability to describe genes that are likely to have equivalent roles in different genomes. Because of this limitation, the concept of ‘positional orthology’ has emerged, which describes the relation between orthologous genes that retain their ancestral genomic positions. In this review, we formally define this concept, for which we introduce the shorter term ‘toporthology’, with respect to the evolutionary events experienced by a gene’s ancestors. Through a discussion of recent studies on the role of genomic context in gene evolution, we show that the distinction between orthology and toporthology is biologically significant. We then review a number of orthology prediction methods that take genomic context into account and thus that may be used to infer the important relation of toporthology.
Collapse
Affiliation(s)
- Colin N Dewey
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, 5785 Medical Sciences Center, 1300 University Ave, Madison, WI 53706, USA.
| |
Collapse
|
18
|
Ohnuki T, Nakamura A, Okuyama S, Nakamura S. Gene expression profiling in progressively MPTP-lesioned macaques reveals molecular pathways associated with sporadic Parkinson's disease. Brain Res 2010; 1346:26-42. [PMID: 20513370 DOI: 10.1016/j.brainres.2010.05.066] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2009] [Revised: 05/01/2010] [Accepted: 05/24/2010] [Indexed: 12/26/2022]
Abstract
Parkinson's disease (PD) is a common neurodegenerative disease characterized by progressive loss of midbrain dopaminergic neurons. To gain an insight into the mechanisms underlying the progression of PD, gene expression analysis was performed using two different brain regions, the substantia nigra pars compacta (SN) and the striatum (STR), of 1-methyl-4-phenyl-1,2,3,6-tetrahydropyridine (MPTP)-lesioned monkey model of PD. 230 genes were differentially expressed in the MPTP-treated SN compared to control, whereas 452 genes showed altered expression in the MPTP-treated STR, implying that MPTP elicits more damages in the striatal gene expression than in the SN. Comparative data analysis of the transcription profiles on the PD patients and MPTP monkey models, and pathway analysis indicated several signaling pathways as possible routes to MPTP-induced neurodegeneration. Interestingly, the networks which associated with cytoskeletal stability, ubiquitin-proteasome system (UPS) and Wnt signaling gained prominence in our study. Further transcriptional regulatory network analysis suggested the association of the neuronal repressor REST (RE1-silencing transcription factor; NRSF) and androgen receptor with the dysregulation of the striatal genes. Our study suggests the possibility that the dysfunction of multi-network signaling may induce abnormalities in a diverse range of biological processes, such as synaptic function, cytoskeletal stability, survival and differentiation.
Collapse
Affiliation(s)
- Tatsuya Ohnuki
- Molecular Function and Pharmacology Laboratories, Taisho Pharmaceutical Co., Ltd., Saitama, 331-9530, Japan.
| | | | | | | |
Collapse
|
19
|
Meyer M, Munzner T, Pfister H. MizBee: a multiscale synteny browser. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2009; 15:897-904. [PMID: 19834152 DOI: 10.1109/tvcg.2009.167] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/07/2023]
Abstract
In the field of comparative genomics, scientists seek to answer questions about evolution and genomic function by comparing the genomes of species to find regions of shared sequences. Conserved syntenic blocks are an important biological data abstraction for indicating regions of shared sequences. The goal of this work is to show multiple types of relationships at multiple scales in a way that is visually comprehensible in accordance with known perceptual principles. We present a task analysis for this domain where the fundamental questions asked by biologists can be understood by a characterization of relationships into the four types of proximity/location, size, orientation, and similarity/strength, and the four scales of genome, chromosome, block, and genomic feature. We also propose a new taxonomy of the design space for visually encoding conservation data. We present MizBee, a multiscale synteny browser with the unique property of providing interactive side-by-side views of the data across the range of scales supporting exploration of all of these relationship types. We conclude with case studies from two biologists who used MizBee to augment their previous automatic analysis work flow, providing anecdotal evidence about the efficacy oft he system for the visualization of syntenic data, the analysis of conservation relationships, and the communication of scientific insights.
Collapse
|
20
|
Severin J, Waterhouse AM, Kawaji H, Lassmann T, van Nimwegen E, Balwierz PJ, de Hoon MJ, Hume DA, Carninci P, Hayashizaki Y, Suzuki H, Daub CO, Forrest AR. FANTOM4 EdgeExpressDB: an integrated database of promoters, genes, microRNAs, expression dynamics and regulatory interactions. Genome Biol 2009; 10:R39. [PMID: 19374773 PMCID: PMC2688930 DOI: 10.1186/gb-2009-10-4-r39] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2009] [Revised: 03/09/2009] [Accepted: 04/19/2009] [Indexed: 11/28/2022] Open
Abstract
EdgeExpressDB is a novel database and set of interfaces for interpreting biological networks and comparing large high-throughput expression datasets. EdgeExpressDB is a novel database and set of interfaces for interpreting biological networks and comparing large high-throughput expression datasets that requires minimal development for new data types and search patterns. The FANTOM4 EdgeExpress database summarizes gene expression patterns in the context of alternative promoter structures and regulatory transcription factors and microRNAs using intuitive gene-centric and sub-network views. This is an important resource for gene regulation in acute myeloid leukemia, monocyte/macrophage differentiation and human transcriptional networks.
Collapse
Affiliation(s)
- Jessica Severin
- RIKEN Omics Science Center, RIKEN Yokohama Institute, 1-7-22 Suehiro-cho Tsurumi-ku Yokohama, Kanagawa, 230-0045 Japan.
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
21
|
Wang Z, Ding G, Yu Z, Liu L, Li Y. CHSMiner: a GUI tool to identify chromosomal homologous segments. Algorithms Mol Biol 2009; 4:2. [PMID: 19146671 PMCID: PMC2647922 DOI: 10.1186/1748-7188-4-2] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2008] [Accepted: 01/15/2009] [Indexed: 11/21/2022] Open
Abstract
Background The identification of chromosomal homologous segments (CHS) within and between genomes is essential for comparative genomics. Various processes including insertion/deletion and inversion could cause the degeneration of CHSs. Results Here we present a Java software CHSMiner that detects CHSs based on shared gene content alone. It implements fast greedy search algorithm and rigorous statistical validation, and its friendly graphical interface allows interactive visualization of the results. We tested the software on both simulated and biological realistic data and compared its performance with similar existing software and data source. Conclusion CHSMiner is characterized by its integrated workflow, fast speed and convenient usage. It will be useful for both experimentalists and bioinformaticians interested in the structure and evolution of genomes.
Collapse
|
22
|
Abouelhoda MI, Kurtz S, Ohlebusch E. CoCoNUT: an efficient system for the comparison and analysis of genomes. BMC Bioinformatics 2008; 9:476. [PMID: 19014477 PMCID: PMC3224568 DOI: 10.1186/1471-2105-9-476] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2008] [Accepted: 11/12/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Comparative genomics is the analysis and comparison of genomes from different species. This area of research is driven by the large number of sequenced genomes and heavily relies on efficient algorithms and software to perform pairwise and multiple genome comparisons. RESULTS Most of the software tools available are tailored for one specific task. In contrast, we have developed a novel system CoCoNUT (Computational Comparative geNomics Utility Toolkit) that allows solving several different tasks in a unified framework: (1) finding regions of high similarity among multiple genomic sequences and aligning them, (2) comparing two draft or multi-chromosomal genomes, (3) locating large segmental duplications in large genomic sequences, and (4) mapping cDNA/EST to genomic sequences. CONCLUSION CoCoNUT is competitive with other software tools w.r.t. the quality of the results. The use of state of the art algorithms and data structures allows CoCoNUT to solve comparative genomics tasks more efficiently than previous tools. With the improved user interface (including an interactive visualization component), CoCoNUT provides a unified, versatile, and easy-to-use software tool for large scale studies in comparative genomics.
Collapse
|
23
|
Advances in the sequencing of the genome of the adenophorean nematode Trichinella spiralis. Parasitology 2008; 135:869-80. [PMID: 18598573 DOI: 10.1017/s0031182008004472] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
The adenophorean nematodes are evolutionarily distant from other species in the phylum Nematoda. Interspecific comparisons of predicted proteins have supported such an ancient divergence. Accordingly, Trichinella spiralis represents a basal nematode representative for genome sequencing focused on gaining a deeper insight into the evolutionary biology of nematodes. In addition, molecular characteristics that are conserved across the phylum could be of great value for control strategies with broad application. In this review, we describe and summarize progress that has been made on the sequencing and analysis of the T. spiralis genome. The genome sequence was used in preliminary analyses for the investigation of specific questions relating to the biology of T. spiralis and, more generally, to parasitic nematodes. For instance, we evaluated an unusually large DNase II-like protein family, predicted proteins of prospective interest in the parasite-host muscle cell interaction, anthelmintic targets and prospective intestinal genes, the encoded proteins (potentially) linked to immunological control against other nematodes. The results are discussed in relation to characteristics that are broadly conserved among evolutionary distant nematodes. The results lead to expectations that this genome sequence will contribute to advances in research on T. spiralis and other parasitic nematodes.
Collapse
|
24
|
Abstract
"Omics" experiments amass large amounts of data requiring integration of several data sources for data interpretation. For instance, microarray, metabolomic, and proteomic experiments may at most yield a list of active genes, metabolites, or proteins, respectively. More generally, the experiments yield active features that represent subsequences of the gene, a chemical shift within a complex mixture, or peptides, respectively. Thus, in the best-case scenario, the investigator is left to identify the functional significance, but more likely the investigator must first identify the larger context of the feature (e.g., which gene, metabolite, or protein is being represented by the feature). To completely annotate function, several different databases are required, including sequence, genome, gene function, protein, and protein interaction databases. Because of the limited coverage of some microarrays or experiments, biological data repositories may be consulted, in the case of microarrays, to complement results. Many of the data sources and databases available for gene function characterization, including tools from the National Center for Biotechnology Information, Gene Ontology, and UniProt, are discussed.
Collapse
Affiliation(s)
- Lyle D Burgoon
- Department of Biochemistry & Molecular Biology, Michigan State University, East Lansing, Michigan, USA
| | | |
Collapse
|
25
|
Kang BY, Kim S, Lee KH, Lee YS, Hong I, Lee MO, Min D, Chang I, Hwang JS, Park JS, Kim DH, Kim BG. Transcriptional profiling in human HaCaT keratinocytes in response to kaempferol and identification of potential transcription factors for regulating differential gene expression. Exp Mol Med 2008; 40:208-19. [PMID: 18446059 DOI: 10.3858/emm.2008.40.2.208] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open
Abstract
Kaempferol is the major flavonol in green tea and exhibits many biomedically useful properties such as antioxidative, cytoprotective and anti-apoptotic activities. To elucidate its effects on the skin, we investigated the transcriptional profiles of kaempferol-treated HaCaT cells using cDNA microarray analysis and identified 147 transcripts that exhibited significant changes in expression. Of these, 18 were up-regulated and 129 were down-regulated. These transcripts were then classified into 12 categories according to their functional roles: cell adhesion/cytoskeleton, cell cycle, redox homeostasis, immune/defense responses, metabolism, protein biosynthesis/modification, intracellular transport, RNA processing, DNA modification/ replication, regulation of transcription, signal transduction and transport. We then analyzed the promoter sequences of differentially-regulated genes and identified over-represented regulatory sites and candidate transcription factors (TFs) for gene regulation by kaempferol. These included c-REL, SAP-1, Ahr-ARNT, Nrf-2, Elk-1, SPI-B, NF-kappaB and p65. In addition, we validated the microarray results and promoter analyses using conventional methods such as real-time PCR and ELISA-based transcription factor assay. Our microarray analysis has provided useful information for determining the genetic regulatory network affected by kaempferol, and this approach will be useful for elucidating gene-phytochemical interactions.
Collapse
Affiliation(s)
- Byung Young Kang
- School of Chemical and Biological Engineering, Seoul National University, Seoul 151-742, Korea
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
26
|
Skibola CF, Bracci PM, Halperin E, Nieters A, Hubbard A, Paynter RA, Skibola DR, Agana L, Becker N, Tressler P, Forrest MS, Sankararaman S, Conde L, Holly EA, Smith MT. Polymorphisms in the estrogen receptor 1 and vitamin C and matrix metalloproteinase gene families are associated with susceptibility to lymphoma. PLoS One 2008; 3:e2816. [PMID: 18636124 PMCID: PMC2474696 DOI: 10.1371/journal.pone.0002816] [Citation(s) in RCA: 69] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2007] [Accepted: 07/07/2008] [Indexed: 02/01/2023] Open
Abstract
BACKGROUND Non-Hodgkin lymphoma (NHL) is the fifth most common cancer in the U.S. and few causes have been identified. Genetic association studies may help identify environmental risk factors and enhance our understanding of disease mechanisms. METHODOLOGY/PRINCIPAL FINDINGS 768 coding and haplotype tagging SNPs in 146 genes were examined using Illumina GoldenGate technology in a large population-based case-control study of NHL in the San Francisco Bay Area (1,292 cases 1,375 controls are included here). Statistical analyses were restricted to HIV- participants of white non-Hispanic origin. Genes involved in steroidogenesis, immune function, cell signaling, sunlight exposure, xenobiotic metabolism/oxidative stress, energy balance, and uptake and metabolism of cholesterol, folate and vitamin C were investigated. Sixteen SNPs in eight pathways and nine haplotypes were associated with NHL after correction for multiple testing at the adjusted q<0.10 level. Eight SNPs were tested in an independent case-control study of lymphoma in Germany (494 NHL cases and 494 matched controls). Novel associations with common variants in estrogen receptor 1 (ESR1) and in the vitamin C receptor and matrix metalloproteinase gene families were observed. Four ESR1 SNPs were associated with follicular lymphoma (FL) in the U.S. study, with rs3020314 remaining associated with reduced risk of FL after multiple testing adjustments [odds ratio (OR) = 0.42, 95% confidence interval (CI) = 0.23-0.77) and replication in the German study (OR = 0.24, 95% CI = 0.06-0.94). Several SNPs and haplotypes in the matrix metalloproteinase-3 (MMP3) and MMP9 genes and in the vitamin C receptor genes, solute carrier family 23 member 1 (SLC23A1) and SLC23A2, showed associations with NHL risk. CONCLUSIONS/SIGNIFICANCE Our findings suggest a role for estrogen, vitamin C and matrix metalloproteinases in the pathogenesis of NHL that will require further validation.
Collapse
Affiliation(s)
- Christine F Skibola
- School of Public Health, Division of Environmental Health Sciences, University of California Berkeley, Berkeley, California, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
27
|
Fernández-Suárez XM, Schuster MK. Using the Ensembl genome server to browse genomic sequence data. ACTA ACUST UNITED AC 2008; Chapter 1:Unit 1.15. [PMID: 18428779 DOI: 10.1002/0471250953.bi0115s16] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
The Ensembl genome Web browser (http://www.ensembl.org) provides a comprehensive source of automatic annotation of the human genome sequence (as well as other species of biomedical interest), with confirmed gene predictions that have been integrated with external data sources. This unit describes how to use the Ensembl browser, how to find your gene or protein of interest and get information and external links about them, and how to use the comparative genomic data.
Collapse
|
28
|
Abstract
In recent years, genome-wide detection of alternative splicing based on Expressed Sequence Tag (EST) sequence alignments with mRNA and genomic sequences has dramatically expanded our understanding of the role of alternative splicing in functional regulation. This chapter reviews the data, methodology, and technical challenges of these genome-wide analyses of alternative splicing, and briefly surveys some of the uses to which such alternative splicing databases have been put. For example, with proper alternative splicing database schema design, it is possible to query genome-wide for alternative splicing patterns that are specific to particular tissues, disease states (e.g., cancer), gender, or developmental stages. EST alignments can be used to estimate exon inclusion or exclusion level of alternatively spliced exons and evolutionary changes for various species can be inferred from exon inclusion level. Such databases can also help automate design of probes for RT-PCR and microarrays, enabling high throughput experimental measurement of alternative splicing.
Collapse
|
29
|
Hedeler C, Wong HM, Cornell MJ, Alam I, Soanes DM, Rattray M, Hubbard SJ, Talbot NJ, Oliver SG, Paton NW. e-Fungi: a data resource for comparative analysis of fungal genomes. BMC Genomics 2007; 8:426. [PMID: 18028535 PMCID: PMC2242804 DOI: 10.1186/1471-2164-8-426] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2007] [Accepted: 11/20/2007] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The number of sequenced fungal genomes is ever increasing, with about 200 genomes already fully sequenced or in progress. Only a small percentage of those genomes have been comprehensively studied, for example using techniques from functional genomics. Comparative analysis has proven to be a useful strategy for enhancing our understanding of evolutionary biology and of the less well understood genomes. However, the data required for these analyses tends to be distributed in various heterogeneous data sources, making systematic comparative studies a cumbersome task. Furthermore, comparative analyses benefit from close integration of derived data sets that cluster genes or organisms in a way that eases the expression of requests that clarify points of similarity or difference between species. DESCRIPTION To support systematic comparative analyses of fungal genomes we have developed the e-Fungi database, which integrates a variety of data for more than 30 fungal genomes. Publicly available genome data, functional annotations, and pathway information has been integrated into a single data repository and complemented with results of comparative analyses, such as MCL and OrthoMCL cluster analysis, and predictions of signaling proteins and the sub-cellular localisation of proteins. To access the data, a library of analysis tasks is available through a web interface. The analysis tasks are motivated by recent comparative genomics studies, and aim to support the study of evolutionary biology as well as community efforts for improving the annotation of genomes. Web services for each query are also available, enabling the tasks to be incorporated into workflows. CONCLUSION The e-Fungi database provides fungal biologists with a resource for comparative studies of a large range of fungal genomes. Its analysis library supports the comparative study of genome data, functional annotation, and results of large scale analyses over all the genomes stored in the database. The database is accessible at http://www.e-fungi.org.uk, as is the WSDL for the web services.
Collapse
Affiliation(s)
- Cornelia Hedeler
- School of Computer Science, The University of Manchester, Manchester, M13 9PL, UK
| | - Han Min Wong
- School of Biosciences, University of Exeter, Exeter, EX4 4QD, UK
| | - Michael J Cornell
- School of Computer Science, The University of Manchester, Manchester, M13 9PL, UK
| | - Intikhab Alam
- School of Computer Science, The University of Manchester, Manchester, M13 9PL, UK
| | - Darren M Soanes
- School of Biosciences, University of Exeter, Exeter, EX4 4QD, UK
| | - Magnus Rattray
- School of Computer Science, The University of Manchester, Manchester, M13 9PL, UK
| | - Simon J Hubbard
- Faculty of Life Sciences, The University of Manchester, Manchester, M13 9PT, UK
| | | | - Stephen G Oliver
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, UK
| | - Norman W Paton
- School of Computer Science, The University of Manchester, Manchester, M13 9PL, UK
| |
Collapse
|
30
|
Baird SD, Lewis SM, Turcotte M, Holcik M. A search for structurally similar cellular internal ribosome entry sites. Nucleic Acids Res 2007; 35:4664-77. [PMID: 17591613 PMCID: PMC1950536 DOI: 10.1093/nar/gkm483] [Citation(s) in RCA: 59] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2007] [Revised: 05/31/2007] [Accepted: 06/04/2007] [Indexed: 01/01/2023] Open
Abstract
Internal ribosome entry sites (IRES) allow ribosomes to be recruited to mRNA in a cap-independent manner. Some viruses that impair cap-dependent translation initiation utilize IRES to ensure that the viral RNA will efficiently compete for the translation machinery. IRES are also employed for the translation of a subset of cellular messages during conditions that inhibit cap-dependent translation initiation. IRES from viruses like Hepatitis C and Classical Swine Fever virus share a similar structure/function without sharing primary sequence similarity. Of the cellular IRES structures derived so far, none were shown to share an overall structural similarity. Therefore, we undertook a genome-wide search of human 5'UTRs (untranslated regions) with an empirically derived structure of the IRES from the key inhibitor of apoptosis, X-linked inhibitor of apoptosis protein (XIAP), to identify novel IRES that share structure/function similarity. Three of the top matches identified by this search that exhibit IRES activity are the 5'UTRs of Aquaporin 4, ELG1 and NF-kappaB repressing factor (NRF). The structures of AQP4 and ELG1 IRES have limited similarity to the XIAP IRES; however, they share trans-acting factors that bind the XIAP IRES. We therefore propose that cellular IRES are not defined by overall structure, as viral IRES, but are instead dependent upon short motifs and trans-acting factors for their function.
Collapse
Affiliation(s)
- Stephen D. Baird
- Department of Biochemistry, Microbiology and Immunology, Department of Pediatrics and School of Information Technology and Engineering, University of Ottawa, ON, Canada and Apoptosis Research Centre, Children's Hospital of Eastern Ontario, Ottawa, ON, Canada, K1H 8L1
| | - Stephen M. Lewis
- Department of Biochemistry, Microbiology and Immunology, Department of Pediatrics and School of Information Technology and Engineering, University of Ottawa, ON, Canada and Apoptosis Research Centre, Children's Hospital of Eastern Ontario, Ottawa, ON, Canada, K1H 8L1
| | - Marcel Turcotte
- Department of Biochemistry, Microbiology and Immunology, Department of Pediatrics and School of Information Technology and Engineering, University of Ottawa, ON, Canada and Apoptosis Research Centre, Children's Hospital of Eastern Ontario, Ottawa, ON, Canada, K1H 8L1
| | - Martin Holcik
- Department of Biochemistry, Microbiology and Immunology, Department of Pediatrics and School of Information Technology and Engineering, University of Ottawa, ON, Canada and Apoptosis Research Centre, Children's Hospital of Eastern Ontario, Ottawa, ON, Canada, K1H 8L1
| |
Collapse
|
31
|
Gene function prediction based on genomic context clustering and discriminative learning: an application to bacteriophages. BMC Bioinformatics 2007; 8 Suppl 4:S6. [PMID: 17570149 PMCID: PMC1892085 DOI: 10.1186/1471-2105-8-s4-s6] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Existing methods for whole-genome comparisons require prior knowledge of related species and provide little automation in the function prediction process. Bacteriophage genomes are an example that cannot be easily analyzed by these methods. This work addresses these shortcomings and aims to provide an automated prediction system of gene function. RESULTS We have developed a novel system called SynFPS to perform gene function prediction over completed genomes. The prediction system is initialized by clustering a large collection of weakly related genomes into groups based on their resemblance in gene distribution. From each individual group, data are then extracted and used to train a Support Vector Machine that makes gene function predictions. Experiments were conducted with 9 different gene functions over 296 bacteriophage genomes. Cross validation results gave an average prediction accuracy of ~80%, which is comparable to other genomic-context based prediction methods. Functional predictions are also made on 3 uncharacterized genes and 12 genes that cannot be identified by sequence alignment. The software is publicly available at http://www.synteny.net/. CONCLUSION The proposed system employs genomic context to predict gene function and detect gene correspondence in whole-genome comparisons. Although our experimental focus is on bacteriophages, the method may be extended to other microbial genomes as they share a number of similar characteristics with phage genomes such as gene order conservation.
Collapse
|
32
|
Lu D, Klug A. Invariance of the zinc finger module: a comparison of the free structure with those in nucleic-acid complexes. Proteins 2007; 67:508-12. [PMID: 17335000 DOI: 10.1002/prot.21289] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Affiliation(s)
- Duo Lu
- Laboratory of Molecular Biology, Medical Research Council, Cambridge CB2 2QH, United Kingdom
| | | |
Collapse
|
33
|
Li X, Ghandri N, Piancatelli D, Adams S, Chen D, Robbins FM, Wang E, Monaco A, Selleri S, Bouaouina N, Stroncek D, Adorno D, Chouchane L, Marincola FM. Associations between HLA class I alleles and the prevalence of nasopharyngeal carcinoma (NPC) among Tunisians. J Transl Med 2007; 5:22. [PMID: 17480220 PMCID: PMC1887520 DOI: 10.1186/1479-5876-5-22] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2007] [Accepted: 05/04/2007] [Indexed: 11/26/2022] Open
Abstract
The high prevalence of nasopharyngeal cancer (NPC) in Southern Asia and Mediterranean Northern Africa suggests genetic predisposition among other factors. While Human Leukocyte Antigen (HLA) haplotypes have been conclusively associated with NPC predisposition in Asians, Northern African Maghrebians have been less intensely studied. However, low resolution serological methods identified weak positive associations with HLA-B5, B13 and B18 and a negative with HLA-B14. Using sequence based typing (SBT), we performed a direct comparison of HLA class I frequencies in a cohort of 136 Tunisian patients with NPC matched for gender, age and geographical residence to 148 normal Tunisians. The bimodal age distribution of NPC in Maghrebians was also taken into account. HLA frequencies in normal Tunisians were also compared with those of Northern Moroccan Berbers (ME) to evaluate whether the Tunisian population in this study could be considered representative of other Maghrebian populations. HLA-B14 and -Cw08 were negatively associated with NPC (odd ratio = 0.09 and 0.18 respectively, Fisher p2-value = 0.0001 and = 0.003). Moreover, positive associations were observed for HLA-B-18, -B51 (split of -B5) and -B57 (p2-value < 0.025 in all) confirming previous findings in Maghrebs. The HLA-B14/Cw*08 haplotype frequency (HF) was 0.007 in NPC patients compared to 0.057 in both Tunisian (OR = 0.12; p2-value = 0.001) and Moroccan controls. This study confirms several previous associations noted by serologic typing between HLA class I alleles and the prevalence of NPC in Maghrebians populations. In addition, we identified a putative haplotype rare in Tunisian patients with NPC that may serve as a genetic marker for further susceptibility studies.
Collapse
Affiliation(s)
- Xin Li
- Immunogenetics Section, Department of Transfusion Medicine, Clinical Center, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Nahla Ghandri
- Laboratory of Molecular Immunology and Oncology, Faculty of Medicine of Monastir, Monastir, Tunisia
| | | | - Sharon Adams
- Immunogenetics Section, Department of Transfusion Medicine, Clinical Center, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Deborah Chen
- Immunogenetics Section, Department of Transfusion Medicine, Clinical Center, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Fu-Meei Robbins
- Immunogenetics Section, Department of Transfusion Medicine, Clinical Center, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Ena Wang
- Immunogenetics Section, Department of Transfusion Medicine, Clinical Center, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Alessandro Monaco
- Immunogenetics Section, Department of Transfusion Medicine, Clinical Center, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Silvia Selleri
- Immunogenetics Section, Department of Transfusion Medicine, Clinical Center, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Noureddine Bouaouina
- Laboratory of Molecular Immunology and Oncology, Faculty of Medicine of Monastir, Monastir, Tunisia
| | - David Stroncek
- Immunogenetics Section, Department of Transfusion Medicine, Clinical Center, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Domenico Adorno
- CNR, Institute for Organ Transplant and Immunocytology, L'Aquila, Italy
| | - Lotfi Chouchane
- Laboratory of Molecular Immunology and Oncology, Faculty of Medicine of Monastir, Monastir, Tunisia
| | - Francesco M Marincola
- Immunogenetics Section, Department of Transfusion Medicine, Clinical Center, National Institutes of Health, Bethesda, MD, 20892, USA
| |
Collapse
|
34
|
Kim SJ, Lee KH, Lee YS, Mun EG, Kwon DY, Cha YS. Transcriptome analysis and promoter sequence studies on early adipogenesis in 3T3-L1 cells. Nutr Res Pract 2007; 1:19-28. [PMID: 20535381 PMCID: PMC2882572 DOI: 10.4162/nrp.2007.1.1.19] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2007] [Revised: 03/02/2007] [Accepted: 03/05/2007] [Indexed: 12/13/2022] Open
Abstract
To identify regulatory molecules which play key roles in the development of obesity, we investigated the transcriptional profiles in 3T3-L1 cells at early stage of differentiation and analyzed the promoter sequences of differentially regulated genes. One hundred and sixty-one (161) genes were found to have significant changes in expression at the 2nd day following treatment with differentiation cocktail. Among them, 86 transcripts were up-regulated and 75 transcripts were down-regulated. The 161 transcripts were classified into 10 categories according to their functional roles; cytoskeleton, cell adhesion, immune, defense response, metabolism, protein modification, protein metabolism, regulation of transcription, signal transduction and transporter. To identify transcription factors likely involved in regulating these differentially expressed genes, we analyzed the promoter sequences of up- or -down regulated genes for the presence of transcription factor binding sites (TFBSs). Based on coincidence of regulatory sites, we have identified candidate transcription factors (TFs), which include those previously known to be involved in adipogenesis (CREB, OCT-1 and c-Myc). Among them, c-Myc was also identified by our microarray data. Our approach to take advantage of the resource of the human genome sequences and the results from our microarray experiments should be validated by further studies of promoter occupancy and TF perturbation.
Collapse
Affiliation(s)
- Su-Jong Kim
- Department of Biochemistry, College of Medicine, Hanyang University, Seoul 133-791, Korea
| | | | | | | | | | | |
Collapse
|
35
|
Sinha AU, Meller J. Cinteny: flexible analysis and visualization of synteny and genome rearrangements in multiple organisms. BMC Bioinformatics 2007; 8:82. [PMID: 17343765 PMCID: PMC1821339 DOI: 10.1186/1471-2105-8-82] [Citation(s) in RCA: 93] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2006] [Accepted: 03/08/2007] [Indexed: 11/26/2022] Open
Abstract
Background Identifying syntenic regions, i.e., blocks of genes or other markers with evolutionary conserved order, and quantifying evolutionary relatedness between genomes in terms of chromosomal rearrangements is one of the central goals in comparative genomics. However, the analysis of synteny and the resulting assessment of genome rearrangements are sensitive to the choice of a number of arbitrary parameters that affect the detection of synteny blocks. In particular, the choice of a set of markers and the effect of different aggregation strategies, which enable coarse graining of synteny blocks and exclusion of micro-rearrangements, need to be assessed. Therefore, existing tools and resources that facilitate identification, visualization and analysis of synteny need to be further improved to provide a flexible platform for such analysis, especially in the context of multiple genomes. Results We present a new tool, Cinteny, for fast identification and analysis of synteny with different sets of markers and various levels of coarse graining of syntenic blocks. Using Hannenhalli-Pevzner approach and its extensions, Cinteny also enables interactive determination of evolutionary relationships between genomes in terms of the number of rearrangements (the reversal distance). In particular, Cinteny provides: i) integration of synteny browsing with assessment of evolutionary distances for multiple genomes; ii) flexibility to adjust the parameters and re-compute the results on-the-fly; iii) ability to work with user provided data, such as orthologous genes, sequence tags or other conserved markers. In addition, Cinteny provides many annotated mammalian, invertebrate and fungal genomes that are pre-loaded and available for analysis at . Conclusion Cinteny allows one to automatically compare multiple genomes and perform sensitivity analysis for synteny block detection and for the subsequent computation of reversal distances. Cinteny can also be used to interactively browse syntenic blocks conserved in multiple genomes, to facilitate genome annotation and validation of assemblies for newly sequenced genomes, and to construct and assess phylogenomic trees.
Collapse
Affiliation(s)
- Amit U Sinha
- Department of Computer Science, University of Cincinnati, Cincinnati, OH 45221, USA
| | - Jaroslaw Meller
- Department of Environmental Health, University of Cincinnati College of Medicine, Cincinnati, OH 45267-0056, USA
- Department of Informatics, Nicholas Copernicus University, 87-100 Torun, Poland
| |
Collapse
|
36
|
Derrien T, André C, Galibert F, Hitte C. AutoGRAPH: an interactive web server for automating and visualizing comparative genome maps. Bioinformatics 2006; 23:498-9. [PMID: 17145741 DOI: 10.1093/bioinformatics/btl618] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
UNLABELLED AutoGRAPH is an interactive web server for automatic multi-species comparative genomics analyses based on personal datasets or pre-inserted public datasets. This program automatically identifies conserved segments (CS) and breakpoint regions, assesses the conservation of marker/gene order between organisms, constructs synteny maps for two to three species and generates high-quality, interactive displays facilitating the identification of chromosomal rearrangements. AutoGRAPH can also be used for the integration and comparison of several types of genomic resources (meiotic maps, radiation hybrid maps and genome sequences) for a single species, making AutoGRAPH a versatile tool for comparative genomics analysis. AVAILABILITY http://genoweb.univ-rennes1.fr/tom_dog/AutoGRAPH/. SUPPLEMENTARY INFORMATION A description of the algorithm and additional information are available at http://genoweb.univ-rennes1.fr/tom_dog/AutoGRAPH/Tutorial.php.
Collapse
Affiliation(s)
- Thomas Derrien
- CNRS UMR6061 Génétique et Développement, Université de Rennes1, IFR140, 2 Av du Pr. Léon Bernard, CS 34317, 35043, France
| | | | | | | |
Collapse
|
37
|
Xia H, Bi J, Li Y. Identification of alternative 5'/3' splice sites based on the mechanism of splice site competition. Nucleic Acids Res 2006; 34:6305-13. [PMID: 17098928 PMCID: PMC1669764 DOI: 10.1093/nar/gkl900] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2006] [Revised: 08/01/2006] [Accepted: 10/12/2006] [Indexed: 11/30/2022] Open
Abstract
Alternative splicing plays an important role in regulating gene expression. Currently, most efficient methods use expressed sequence tags or microarray analysis for large-scale detection of alternative splicing. However, it is difficult to detect all alternative splice events with them because of their inherent limitations. Previous computational methods for alternative splicing prediction could only predict particular kinds of alternative splice events. Thus, it would be highly desirable to predict alternative 5'/3' splice sites with various splicing levels using genomic sequences alone. Here, we introduce the competition mechanism of splice sites selection into alternative splice site prediction. This approach allows us to predict not only rarely used but also frequently used alternative splice sites. On a dataset extracted from the AltSplice database, our method correctly classified approximately 70% of the splice sites into alternative and constitutive, as well as approximately 80% of the locations of real competitors for alternative splice sites. It outperforms a method which only considers features extracted from the splice sites themselves. Furthermore, this approach can also predict the changes in activation level arising from mutations in flanking cryptic splice sites of a given splice site. Our approach might be useful for studying alternative splicing in both computational and molecular biology.
Collapse
Affiliation(s)
- Huiyu Xia
- Bioinformatics Division, TNLIST and Department of Automation, Tsinghua University, Beijing 100084, China.
| | | | | |
Collapse
|
38
|
Desmarais E, Belkhir K, Garza JC, Bonhomme F. Local mutagenic impact of insertions of LTR retrotransposons on the mouse genome. J Mol Evol 2006; 63:662-75. [PMID: 17075698 DOI: 10.1007/s00239-005-0301-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2005] [Accepted: 07/26/2006] [Indexed: 11/24/2022]
Abstract
Solitary LTR loci are the predominant form of LTR retrotransposons in most eukaryotic genomes. They originate from recombination between the two LTRs of an ancestral retrovirus and are therefore incapable of transposition. Despite this inactivity, they appear to have a substantial impact on the host genome. Here we use the murine RMER10 LTR family as an example to describe how such elements can reshape regions of the genome through multiple mutations on an evolutionary time scale. Specifically, we use phylogenetic analysis of multiple copies of RMER10 in rodent species, as well as comparisons of orthologous pairs in mouse and rat, to argue that insertions of members of this family have locally induced the emergence of tandem repeat loci as well as many indels. Analysis of structural aspects of these sequences (secondary structures and transcription factors signals) may explain why RMER10 can become endogenous "mutagenic" factors through induction of replication fork blockages and/or error-prone repair of aberrant DNA structures. This hypothesis is also consistent with features of other interspersed repeated elements.
Collapse
Affiliation(s)
- Erick Desmarais
- Laboratoire Génome, Populations, Interactions, Adaptation, UMR5171 CNRS-IFREMER, Université Montpellier II, CC-G3 Montpellier Place E. Bataillon 34095, France.
| | | | | | | |
Collapse
|
39
|
Abstract
The cell has many ways to regulate the production of proteins. One mechanism is through the changes to the machinery of translation initiation. These alterations favor the translation of one subset of mRNAs over another. It was first shown that internal ribosome entry sites (IRESes) within viral RNA genomes allowed the production of viral proteins more efficiently than most of the host proteins. The RNA secondary structure of viral IRESes has sometimes been conserved between viral species even though the primary sequences differ. These structures are important for IRES function, but no similar structure conservation has yet to be shown in cellular IRES. With the advances in mathematical modeling and computational approaches to complex biological problems, is there a way to predict an IRES in a data set of unknown sequences? This review examines what is known about cellular IRES structures, as well as the data sets and tools available to examine this question. We find that the lengths, number of upstream AUGs, and %GC content of 5'-UTRs of the human transcriptome have a similar distribution to those of published IRES-containing UTRs. Although the UTRs containing IRESes are on the average longer, almost half of all 5'-UTRs are long enough to contain an IRES. Examination of the available RNA structure prediction software and RNA motif searching programs indicates that while these programs are useful tools to fine tune the empirically determined RNA secondary structure, the accuracy of de novo secondary structure prediction of large RNA molecules and subsequent identification of new IRES elements by computational approaches, is still not possible.
Collapse
Affiliation(s)
- Stephen D Baird
- Department of Biochemistry, Microbiology and Immunology, University of Ottawa, Ontario K1H 8M5, Canada
| | | | | | | |
Collapse
|
40
|
Goodstadt L, Ponting CP. Phylogenetic reconstruction of orthology, paralogy, and conserved synteny for dog and human. PLoS Comput Biol 2006; 2:e133. [PMID: 17009864 PMCID: PMC1584324 DOI: 10.1371/journal.pcbi.0020133] [Citation(s) in RCA: 91] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2006] [Accepted: 08/21/2006] [Indexed: 01/22/2023] Open
Abstract
Accurate predictions of orthology and paralogy relationships are necessary to infer human molecular function from experiments in model organisms. Previous genome-scale approaches to predicting these relationships have been limited by their use of protein similarity and their failure to take into account multiple splicing events and gene prediction errors. We have developed PhyOP, a new phylogenetic orthology prediction pipeline based on synonymous rate estimates, which accurately predicts orthology and paralogy relationships for transcripts, genes, exons, or genomic segments between closely related genomes. We were able to identify orthologue relationships to human genes for 93% of all dog genes from Ensembl. Among 1:1 orthologues, the alignments covered a median of 97.4% of protein sequences, and 92% of orthologues shared essentially identical gene structures. PhyOP accurately recapitulated genomic maps of conserved synteny. Benchmarking against predictions from Ensembl and Inparanoid showed that PhyOP is more accurate, especially in its predictions of paralogy. Nearly half (46%) of PhyOP paralogy predictions are unique. Using PhyOP to investigate orthologues and paralogues in the human and dog genomes, we found that the human assembly contains 3-fold more gene duplications than the dog. Species-specific duplicate genes, or "in-paralogues," are generally shorter and have fewer exons than 1:1 orthologues, which is consistent with selective constraints and mutation biases based on the sizes of duplicated genes. In-paralogues have experienced elevated amino acid and synonymous nucleotide substitution rates. Duplicates possess similar biological functions for either the dog or human lineages. Having accounted for 2,954 likely pseudogenes and gene fragments, and after separating 346 erroneously merged genes, we estimated that the human genome encodes a minimum of 19,700 protein-coding genes, similar to the gene count of nematode worms. PhyOP is a fast and robust approach to orthology prediction that will be applicable to whole genomes from multiple closely related species. PhyOP will be particularly useful in predicting orthology for mammalian genomes that have been incompletely sequenced, and for large families of rapidly duplicating genes.
Collapse
Affiliation(s)
- Leo Goodstadt
- Medical Research Council Functional Genetics Unit, University of Oxford, Department of Physiology, Anatomy, and Genetics, Oxford, United Kingdom.
| | | |
Collapse
|
41
|
Dabrowski M, Aerts S, Kaminska B. Prediction of a key role of motifs binding E2F and NR2F in down-regulation of numerous genes during the development of the mouse hippocampus. BMC Bioinformatics 2006; 7:367. [PMID: 16884529 PMCID: PMC1560171 DOI: 10.1186/1471-2105-7-367] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2006] [Accepted: 08/02/2006] [Indexed: 11/23/2022] Open
Abstract
Background We previously demonstrated that gene expression profiles during neuronal differentiation in vitro and hippocampal development in vivo were very similar, due to a conservation of the important second singular value decomposition (SVD) mode (Mode 2) of expression. The conservation of Mode 2 suggests that it reflects a regulatory mechanism conserved between the two systems. In either dataset, the expression vectors of all the genes form two large clusters that differ in the sign of the contribution of Mode 2, which for the majority of them reflects the difference between down- or up-regulation. Results In the current work, we used a novel approach of analyzing cis-regulation of gene expression in a subspace of a single SVD mode of temporal expression profiles. In the putative upstream regulatory sequences identified by mouse-human homology for all the genes represented in either dataset, we searched for simple features (motifs and pairs of motifs) associated with either sign of the loading of Mode 2. Using a cross-system training-test set approach, we identified E2F binding sites as predictors of down-regulation of gene expression during hippocampal development. NR2F binding sites, for the transcription factors Nr2f/COUP and Hnf4, and also NR2F_SP1 pairs of binding sites, were predictors of down-regulation of expression both during hippocampal development and neuronal differentiation. Analysis of another dataset, from gene profiling of myoblast differentiation in vitro, shows that the conservation of Mode 2 extends to the differentiation of mesenchymal cells. This permitted the identification of two more pairs of motifs, one of which included the CDE/CHR tandem element, as features associated with down-regulation both in the differentiating myoblasts and in the developing hippocampus. Of the features we identified, the E2F and CDE/CHR motifs may be associated with the cycling progenitor cell status, while NR2F may be related to the entry into differentiation along the neuronal pathway. Conclusion Our results constitute the first prediction of an expression pattern from the genomic sequence for the developing mammalian brain, and demonstrate a potential for the analysis of gene regulation in a subspace of a single SVD mode of expression.
Collapse
Affiliation(s)
- Michal Dabrowski
- Laboratory of Transcription Regulation, Department of Cell Biology, The Nencki Institute of Experimental Biology, Pasteura 3, 02-093 Warsaw, Poland
| | - Stein Aerts
- Laboratory of Neurogenetics, Department of Human Genetics, VIB and Katholieke Universiteit Leuven, Herestraat 49, 3000 Leuven, Belgium
| | - Bozena Kaminska
- Laboratory of Transcription Regulation, Department of Cell Biology, The Nencki Institute of Experimental Biology, Pasteura 3, 02-093 Warsaw, Poland
| |
Collapse
|
42
|
Yang XJ, Sugimura J, Schafernak KT, Tretiakova MS, Han M, Vogelzang NJ, Furge K, Teh BT. Classification of Renal Neoplasms Based on Molecular Signatures. J Urol 2006; 175:2302-6. [PMID: 16697863 DOI: 10.1016/s0022-5347(06)00255-2] [Citation(s) in RCA: 55] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2005] [Indexed: 11/18/2022]
Abstract
PURPOSE Gene expression microarray studies have demonstrated distinct molecular signatures for different types of renal neoplasms based on overall gene expression patterns. However, in most of these studies the investigators used renal tumors with defined histology. We analyzed a test set of renal tumors in double-blind fashion using recently established molecular profiles of renal tumors as benchmarks. MATERIALS AND METHODS A total of 16 consecutive nephrectomies performed for neoplasms at a single urological service were subjected to gene expression profiling using cDNA chips containing 21,632 genes. Analysis was clustered with our previously established molecular profiles of 91 histologically defined kidney neoplasms and comparative genomic microarray analysis while blinded to tumor histology and clinical information. RESULTS With molecular analysis 9, 4, 2 and 1 tumors were classified as clear cell, papillary RCC, chromophobe RCC, and renal oncocytoma, respectively. Histopathological evaluation was concordant in 14 tumors. One of the 2 tumors with a discrepancy between molecular and pathological diagnoses was composed of oncocytoma and high grade clear cell RCC, and the other was chromophobe RCC that histologically mimicked papillary RCC. CONCLUSIONS We report the feasibility of the molecular diagnosis and classification of unknown renal neoplasms. Molecular diagnosis appears to be reliable and comparable to the standard of urological pathology. This molecular method may be a potentially useful test for establishing an accurate diagnosis that can impact clinical management.
Collapse
Affiliation(s)
- Ximing J Yang
- Department of Pathology, Northwestern University, Feinberg School of Medicine, Chicago, Illinois, USA
| | | | | | | | | | | | | | | |
Collapse
|
43
|
Klimova NV, Levitsky VG, Ignatieva EV, Vasiliev GV, Kobzev VF, Busygina TV, Merkulova TI, Kolchanov NA. Potential binding sites for SF-1: Recognition by the SiteGA method, experimental verification, and search for new target genes. Mol Biol 2006. [DOI: 10.1134/s0026893306030125] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
44
|
Rashi-Elkeles S, Elkon R, Weizman N, Linhart C, Amariglio N, Sternberg G, Rechavi G, Barzilai A, Shamir R, Shiloh Y. Parallel induction of ATM-dependent pro- and antiapoptotic signals in response to ionizing radiation in murine lymphoid tissue. Oncogene 2006; 25:1584-92. [PMID: 16314843 DOI: 10.1038/sj.onc.1209189] [Citation(s) in RCA: 61] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
The ATM protein kinase, functionally missing in patients with the human genetic disorder ataxia-telangiectasia, is a master regulator of the cellular network induced by DNA double-strand breaks. The ATM gene is also frequently mutated in sporadic cancers of lymphoid origin. Here, we applied a functional genomics approach that combined gene expression profiling and computational promoter analysis to obtain global dissection of the transcriptional response to ionizing radiation in murine lymphoid tissue. Cluster analysis revealed a prominent pattern characterizing dozens of genes whose response to irradiation was Atm-dependent. Computational analysis identified significant enrichment of the binding site signatures of NF-kappaB and p53 among promoters of these genes, pointing to the major role of these two transcription factors in mediating the Atm-dependent transcriptional response in the irradiated lymphoid tissue. Examination of the response showed that pro- and antiapoptotic signals were simultaneously induced, with the proapoptotic pathway mediated by p53 targets, and the prosurvival pathway by NF-kappaB targets. These findings further elucidate the molecular network induced by IR, point to novel putative NF-kappaB targets, and suggest a mechanistic model for cellular balancing between pro- and antiapoptotic signals induced by IR in lymphoid tissues, which has implications for cancer management. The emerging model suggests that restoring the p53-mediated apoptotic arm while blocking the NF-kappaB-mediated prosurvival arm could effectively increase the radiosensitivity of lymphoid tumors.
Collapse
Affiliation(s)
- S Rashi-Elkeles
- The David and Inez Myers Laboratory for Genetic Research, Department of Human Genetics, Sackler School of Medicine, Tel Aviv, Israel
| | | | | | | | | | | | | | | | | | | |
Collapse
|
45
|
Kornhaber GJ, Snyder D, Moseley HNB, Montelione GT. Identification of zinc-ligated cysteine residues based on 13Calpha and 13Cbeta chemical shift data. JOURNAL OF BIOMOLECULAR NMR 2006; 34:259-69. [PMID: 16645816 DOI: 10.1007/s10858-006-0027-5] [Citation(s) in RCA: 79] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/14/2005] [Accepted: 02/27/2006] [Indexed: 05/08/2023]
Abstract
Although a significant number of proteins include bound metals as part of their structure, the identification of amino acid residues coordinated to non-paramagnetic metals by NMR remains a challenge. Metal ligands can stabilize the native structure and/or play critical catalytic roles in the underlying biochemistry. An atom's chemical shift is exquisitely sensitive to its electronic environment. Chemical shift data can provide valuable insights into structural features, including metal ligation. In this study, we demonstrate that overlapped 13Cbeta chemical shift distributions of Zn-ligated and non-metal-ligated cysteine residues are largely resolved by the inclusion of the corresponding 13Calpha chemical shift information, together with secondary structural information. We demonstrate this with a bivariate distribution plot, and statistically with a multivariate analysis of variance (MANOVA) and hierarchical logistic regression analysis. Using 287 13Calpha/13Cbeta shift pairs from 79 proteins with known three-dimensional structures, including 86 13Calpha and 13Cbeta shifts for 43 Zn-ligated cysteine residues, along with corresponding oxidation state and secondary structure information, we have built a logistic regression model that distinguishes between oxidized cystines, reduced (non-metal ligated) cysteines, and Zn-ligated cysteines. Classifying cysteines/cystines with a statistical model incorporating all three phenomena resulted in a predictor of Zn ligation with a recall, precision and F-measure of 83.7%, and an accuracy of 95.1%. This model was applied in the analysis of Bacillus subtilis IscU, a protein involved in iron-sulfur cluster assembly. The model predicts that all three cysteines of IscU are metal ligands. We confirmed these results by (i) examining the effect of metal chelation on the NMR spectrum of IscU, and (ii) inductively coupled plasma mass spectrometry analysis. To gain further insight into the frequency of occurrence of non-cysteine Zn ligands, we analyzed the Protein Data Bank and found that 78% of the Zn ligands are histidine and cysteine (with nearly identical frequencies), and 18% are acidic residues aspartate and glutamate.
Collapse
Affiliation(s)
- Gregory J Kornhaber
- Center for Advanced Biotechnology and Medicine, Department of Molecular Biology and Biochemistry, Rutgers University, Piscataway, NJ 08854, USA
| | | | | | | |
Collapse
|
46
|
Chandonia JM, Brenner SE. Implications of structural genomics target selection strategies: Pfam5000, whole genome, and random approaches. Proteins 2006; 58:166-79. [PMID: 15521074 DOI: 10.1002/prot.20298] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Structural genomics is an international effort to determine the three-dimensional shapes of all important biological macromolecules, with a primary focus on proteins. Target proteins should be selected according to a strategy that is medically and biologically relevant, of good value, and tractable. As an option to consider, we present the "Pfam5000" strategy, which involves selecting the 5000 most important families from the Pfam database as sources for targets. We compare the Pfam5000 strategy to several other proposed strategies that would require similar numbers of targets. These strategies include complete solution of several small to moderately sized bacterial proteomes, partial coverage of the human proteome, and random selection of approximately 5000 targets from sequenced genomes. We measure the impact that successful implementation of these strategies would have upon structural interpretation of the proteins in Swiss-Prot, TrEMBL, and 131 complete proteomes (including 10 of eukaryotes) from the Proteome Analysis database at the European Bioinformatics Institute (EBI). Solving the structures of proteins from the 5000 largest Pfam families would allow accurate fold assignment for approximately 68% of all prokaryotic proteins (covering 59% of residues) and 61% of eukaryotic proteins (40% of residues). More fine-grained coverage that would allow accurate modeling of these proteins would require an order of magnitude more targets. The Pfam5000 strategy may be modified in several ways, for example, to focus on larger families, bacterial sequences, or eukaryotic sequences; as long as secondary consideration is given to large families within Pfam, coverage results vary only slightly. In contrast, focusing structural genomics on a single tractable genome would have only a limited impact in structural knowledge of other proteomes: A significant fraction (about 30-40% of the proteins and 40-60% of the residues) of each proteome is classified in small families, which may have little overlap with other species of interest. Random selection of targets from one or more genomes is similar to the Pfam5000 strategy in that proteins from larger families are more likely to be chosen, but substantial effort would be spent on small families.
Collapse
Affiliation(s)
- John-Marc Chandonia
- Berkeley Structural Genomics Center, Physical Biosciences Division, Lawrence Berkeley National Laboratory, Berkeley, California, USA
| | | |
Collapse
|
47
|
Yu P, Ma D, Xu M. Nested genes in the human genome. Genomics 2006; 86:414-22. [PMID: 16084061 DOI: 10.1016/j.ygeno.2005.06.008] [Citation(s) in RCA: 61] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2005] [Revised: 06/05/2005] [Accepted: 06/15/2005] [Indexed: 12/01/2022]
Abstract
Here we studied one special type of gene, i.e., the nested gene, in the human genome. We collected 373 reliably annotated nested genes. Two-thirds of them were on the strand opposite that of their host gene. About 58% coding nested gene pairs were conserved in mouse and some were even maintained in chicken and fish, while nested pseudogenes were poorly conserved. Ka/Ks analysis revealed that nested genes were under strong selection, although they did not demonstrate greater conservation than other genes. With microarray data we observed that two partners of one nested pair seemed to be expressed reciprocally. A significant proportion of nested genes were tissue-specifically expressed. Gene ontology analysis demonstrated that quite a number of nested genes participated in cellular signal transduction. Based on these observations, we think that nested genes are a group of genes with important physiological functions.
Collapse
Affiliation(s)
- Peng Yu
- Laboratory of Medical Immunology, School of Basic Medical Sciences, Peking University, Beijing 100083, People's Republic of China.
| | | | | |
Collapse
|
48
|
Liu GE, Adams MD. Genome resources and comparative analysis tools for cardiovascular research. METHODS IN MOLECULAR MEDICINE 2006; 128:101-23. [PMID: 17071992 DOI: 10.1007/978-1-59745-159-8_8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
Disorders of the cardiovascular system are often caused by the interaction of genetic and environmental factors that jointly contribute to individual susceptibility. Genomic data and bioinformatics tools generated from genome projects, coupled with functional verification, offer novel approaches to study both rare single-gene and complex multigenic cardiovascular diseases. These approaches include gene mapping using genome variation, especially single-nucleotide polymorphisms and comparative genomics within and between species. This chapter illustrates the major genome resources, associated bioinformatics tools, and their potential application in cardiovascular research.
Collapse
Affiliation(s)
- George E Liu
- Bovine Functional Genomics Laboratory, Animal and Natural Resources Institute, US Department of Agriculture-Agriculture Research Service, Beltsville, MD, USA
| | | |
Collapse
|
49
|
Jurka J, Kapitonov VV, Pavlicek A, Klonowski P, Kohany O, Walichiewicz J. Repbase Update, a database of eukaryotic repetitive elements. Cytogenet Genome Res 2005; 110:462-7. [PMID: 16093699 DOI: 10.1159/000084979] [Citation(s) in RCA: 2444] [Impact Index Per Article: 122.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2003] [Accepted: 04/06/2004] [Indexed: 12/13/2022] Open
Abstract
Repbase Update is a comprehensive database of repetitive elements from diverse eukaryotic organisms. Currently, it contains over 3600 annotated sequences representing different families and subfamilies of repeats, many of which are unreported anywhere else. Each sequence is accompanied by a short description and references to the original contributors. Repbase Update includes Repbase Reports, an electronic journal publishing newly discovered transposable elements, and the Transposon Pub, a web-based browser of selected chromosomal maps of transposable elements. Sequences from Repbase Update are used to screen and annotate repetitive elements using programs such as Censor and RepeatMasker. Repbase Update is available on the worldwide web at http://www.girinst.org/Repbase_Update.html.
Collapse
Affiliation(s)
- J Jurka
- Genetic Information Research Institute, Mountain View, CA 94043, USA.
| | | | | | | | | | | |
Collapse
|
50
|
Parinov S, Kondrichin I, Korzh V, Emelyanov A. Tol2 transposon-mediated enhancer trap to identify developmentally regulated zebrafish genes in vivo. Dev Dyn 2005; 231:449-59. [PMID: 15366023 DOI: 10.1002/dvdy.20157] [Citation(s) in RCA: 279] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022] Open
Abstract
We have used the Tol2 transposable element to design and perform effective enhancer trapping in zebrafish. Modified transposon DNA and transposase RNA were delivered into zebrafish embryos by microinjection to produce heritable insertions in the zebrafish genome. The enhancer trap construct carries the EGFP gene controlled by a partial epithelial promoter from the keratin8 gene. Enhanced green fluorescent protein (EGFP) is used as a marker to select F1 transgenic fish and as a reporter to trap enhancers. We have isolated 28 transgenic lines that were derived from the 37 GFP-positive F0 founders and displayed various specific EGFP expression patterns in addition to basal expression from the modified keratin 8 promoter. Analyses of expression by whole-mount RNA in situ hybridization demonstrated that these patterns could recapitulate the expression of the tagged genes to a variable extent and, therefore, confirmed that our construct worked effectively as an enhancer trap. Transgenic offspring from the 37 F0 EGFP-positive founders have been genetically analyzed up to the F2 generation. Flanking sequences from 65 separate transposon insertion sites were identified by thermal asymmetric interlaced polymerase chain reaction. Injection of the transposase RNA into transgenic embryos induced remobilization of genomic Tol2 copies producing novel insertions including some in the germ line. The approach has great potential for developmental and anatomical studies of teleosts.
Collapse
|