1
|
Isbel L, Iskar M, Durdu S, Weiss J, Grand RS, Hietter-Pfeiffer E, Kozicka Z, Michael AK, Burger L, Thomä NH, Schübeler D. Readout of histone methylation by Trim24 locally restricts chromatin opening by p53. Nat Struct Mol Biol 2023:10.1038/s41594-023-01021-8. [PMID: 37386214 DOI: 10.1038/s41594-023-01021-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Accepted: 05/15/2023] [Indexed: 07/01/2023]
Abstract
The genomic binding sites of the transcription factor (TF) and tumor suppressor p53 are unusually diverse with regard to their chromatin features, including histone modifications, raising the possibility that the local chromatin environment can contextualize p53 regulation. Here, we show that epigenetic characteristics of closed chromatin, such as DNA methylation, do not influence the binding of p53 across the genome. Instead, the ability of p53 to open chromatin and activate its target genes is locally restricted by its cofactor Trim24. Trim24 binds to both p53 and unmethylated histone 3 lysine 4 (H3K4), thereby preferentially localizing to those p53 sites that reside in closed chromatin, whereas it is deterred from accessible chromatin by H3K4 methylation. The presence of Trim24 increases cell viability upon stress and enables p53 to affect gene expression as a function of the local chromatin state. These findings link H3K4 methylation to p53 function and illustrate how specificity in chromatin can be achieved, not by TF-intrinsic sensitivity to histone modifications, but by employing chromatin-sensitive cofactors that locally modulate TF function.
Collapse
Affiliation(s)
- Luke Isbel
- Friedrich Miescher Institute for Biomedical Research, Basel, Switzerland
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, New South Wales, Australia
| | - Murat Iskar
- Friedrich Miescher Institute for Biomedical Research, Basel, Switzerland
| | - Sevi Durdu
- Friedrich Miescher Institute for Biomedical Research, Basel, Switzerland
| | - Joscha Weiss
- Friedrich Miescher Institute for Biomedical Research, Basel, Switzerland
- Faculty of Sciences, University of Basel, Basel, Switzerland
| | - Ralph S Grand
- Friedrich Miescher Institute for Biomedical Research, Basel, Switzerland
- Zentrum für Molekulare Biologie der Universität Heidelberg, Heidelberg, Germany
| | - Eric Hietter-Pfeiffer
- Friedrich Miescher Institute for Biomedical Research, Basel, Switzerland
- Faculty of Sciences, University of Basel, Basel, Switzerland
| | - Zuzanna Kozicka
- Friedrich Miescher Institute for Biomedical Research, Basel, Switzerland
- Faculty of Sciences, University of Basel, Basel, Switzerland
| | - Alicia K Michael
- Friedrich Miescher Institute for Biomedical Research, Basel, Switzerland
- Biozentrum, University of Basel, Basel, Switzerland
| | - Lukas Burger
- Friedrich Miescher Institute for Biomedical Research, Basel, Switzerland
- Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Nicolas H Thomä
- Friedrich Miescher Institute for Biomedical Research, Basel, Switzerland
| | - Dirk Schübeler
- Friedrich Miescher Institute for Biomedical Research, Basel, Switzerland.
- Faculty of Sciences, University of Basel, Basel, Switzerland.
| |
Collapse
|
2
|
Storer J, Hubley R, Rosen J, Wheeler TJ, Smit AF. The Dfam community resource of transposable element families, sequence models, and genome annotations. Mob DNA 2021; 12:2. [PMID: 33436076 PMCID: PMC7805219 DOI: 10.1186/s13100-020-00230-y] [Citation(s) in RCA: 213] [Impact Index Per Article: 71.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2020] [Accepted: 12/28/2020] [Indexed: 02/02/2023] Open
Abstract
Dfam is an open access database of repetitive DNA families, sequence models, and genome annotations. The 3.0-3.3 releases of Dfam ( https://dfam.org ) represent an evolution from a proof-of-principle collection of transposable element families in model organisms into a community resource for a broad range of species, and for both curated and uncurated datasets. In addition, releases since Dfam 3.0 provide auxiliary consensus sequence models, transposable element protein alignments, and a formalized classification system to support the growing diversity of organisms represented in the resource. The latest release includes 266,740 new de novo generated transposable element families from 336 species contributed by the EBI. This expansion demonstrates the utility of many of Dfam's new features and provides insight into the long term challenges ahead for improving de novo generated transposable element datasets.
Collapse
Affiliation(s)
| | - Robert Hubley
- Institute for Systems Biology, Seattle, WA, 98109, USA.
| | - Jeb Rosen
- Institute for Systems Biology, Seattle, WA, 98109, USA
| | | | - Arian F Smit
- Institute for Systems Biology, Seattle, WA, 98109, USA.
| |
Collapse
|
3
|
Ahmad SF, Singchat W, Jehangir M, Suntronpong A, Panthum T, Malaivijitnond S, Srikulnath K. Dark Matter of Primate Genomes: Satellite DNA Repeats and Their Evolutionary Dynamics. Cells 2020; 9:E2714. [PMID: 33352976 PMCID: PMC7767330 DOI: 10.3390/cells9122714] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2020] [Revised: 12/15/2020] [Accepted: 12/16/2020] [Indexed: 12/12/2022] Open
Abstract
A substantial portion of the primate genome is composed of non-coding regions, so-called "dark matter", which includes an abundance of tandemly repeated sequences called satellite DNA. Collectively known as the satellitome, this genomic component offers exciting evolutionary insights into aspects of primate genome biology that raise new questions and challenge existing paradigms. A complete human reference genome was recently reported with telomere-to-telomere human X chromosome assembly that resolved hundreds of dark regions, encompassing a 3.1 Mb centromeric satellite array that had not been identified previously. With the recent exponential increase in the availability of primate genomes, and the development of modern genomic and bioinformatics tools, extensive growth in our knowledge concerning the structure, function, and evolution of satellite elements is expected. The current state of knowledge on this topic is summarized, highlighting various types of primate-specific satellite repeats to compare their proportions across diverse lineages. Inter- and intraspecific variation of satellite repeats in the primate genome are reviewed. The functional significance of these sequences is discussed by describing how the transcriptional activity of satellite repeats can affect gene expression during different cellular processes. Sex-linked satellites are outlined, together with their respective genomic organization. Mechanisms are proposed whereby satellite repeats might have emerged as novel sequences during different evolutionary phases. Finally, the main challenges that hinder the detection of satellite DNA are outlined and an overview of the latest methodologies to address technological limitations is presented.
Collapse
Affiliation(s)
- Syed Farhan Ahmad
- Laboratory of Animal Cytogenetics and Comparative Genomics (ACCG), Department of Genetics, Faculty of Science, Kasetsart University, Bangkok 10900, Thailand; (S.F.A.); (W.S.); (M.J.); (A.S.); (T.P.)
- Special Research Unit for Wildlife Genomics (SRUWG), Department of Forest Biology, Faculty of Forestry, Kasetsart University, Bangkok 10900, Thailand
| | - Worapong Singchat
- Laboratory of Animal Cytogenetics and Comparative Genomics (ACCG), Department of Genetics, Faculty of Science, Kasetsart University, Bangkok 10900, Thailand; (S.F.A.); (W.S.); (M.J.); (A.S.); (T.P.)
- Special Research Unit for Wildlife Genomics (SRUWG), Department of Forest Biology, Faculty of Forestry, Kasetsart University, Bangkok 10900, Thailand
| | - Maryam Jehangir
- Laboratory of Animal Cytogenetics and Comparative Genomics (ACCG), Department of Genetics, Faculty of Science, Kasetsart University, Bangkok 10900, Thailand; (S.F.A.); (W.S.); (M.J.); (A.S.); (T.P.)
- Department of Structural and Functional Biology, Institute of Bioscience at Botucatu, São Paulo State University (UNESP), Botucatu, São Paulo 18618-689, Brazil
| | - Aorarat Suntronpong
- Laboratory of Animal Cytogenetics and Comparative Genomics (ACCG), Department of Genetics, Faculty of Science, Kasetsart University, Bangkok 10900, Thailand; (S.F.A.); (W.S.); (M.J.); (A.S.); (T.P.)
- Special Research Unit for Wildlife Genomics (SRUWG), Department of Forest Biology, Faculty of Forestry, Kasetsart University, Bangkok 10900, Thailand
| | - Thitipong Panthum
- Laboratory of Animal Cytogenetics and Comparative Genomics (ACCG), Department of Genetics, Faculty of Science, Kasetsart University, Bangkok 10900, Thailand; (S.F.A.); (W.S.); (M.J.); (A.S.); (T.P.)
- Special Research Unit for Wildlife Genomics (SRUWG), Department of Forest Biology, Faculty of Forestry, Kasetsart University, Bangkok 10900, Thailand
| | - Suchinda Malaivijitnond
- National Primate Research Center of Thailand, Chulalongkorn University, Saraburi 18110, Thailand;
- Department of Biology, Faculty of Science, Chulalongkorn University, Bangkok 10330, Thailand
| | - Kornsorn Srikulnath
- Laboratory of Animal Cytogenetics and Comparative Genomics (ACCG), Department of Genetics, Faculty of Science, Kasetsart University, Bangkok 10900, Thailand; (S.F.A.); (W.S.); (M.J.); (A.S.); (T.P.)
- Special Research Unit for Wildlife Genomics (SRUWG), Department of Forest Biology, Faculty of Forestry, Kasetsart University, Bangkok 10900, Thailand
- National Primate Research Center of Thailand, Chulalongkorn University, Saraburi 18110, Thailand;
- Center of Excellence on Agricultural Biotechnology (AG-BIO/PERDO-CHE), Bangkok 10900, Thailand
- Omics Center for Agriculture, Bioresources, Food and Health, Kasetsart University (OmiKU), Bangkok 10900, Thailand
| |
Collapse
|
4
|
The UCSC repeat browser allows discovery and visualization of evolutionary conflict across repeat families. Mob DNA 2020; 11:13. [PMID: 32266012 PMCID: PMC7110667 DOI: 10.1186/s13100-020-00208-w] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2019] [Accepted: 03/10/2020] [Indexed: 01/12/2023] Open
Abstract
Background Nearly half the human genome consists of repeat elements, most of which are retrotransposons, and many of which play important biological roles. However repeat elements pose several unique challenges to current bioinformatic analyses and visualization tools, as short repeat sequences can map to multiple genomic loci resulting in their misclassification and misinterpretation. In fact, sequence data mapping to repeat elements are often discarded from analysis pipelines. Therefore, there is a continued need for standardized tools and techniques to interpret genomic data of repeats. Results We present the UCSC Repeat Browser, which consists of a complete set of human repeat reference sequences derived from annotations made by the commonly used program RepeatMasker. The UCSC Repeat Browser also provides an alignment from the human genome to these references, uses it to map the standard human genome annotation tracks, and presents all of them as a comprehensive interface to facilitate work with repetitive elements. It also provides processed tracks of multiple publicly available datasets of particular interest to the repeat community, including ChIP-seq datasets for KRAB Zinc Finger Proteins (KZNFs) – a family of proteins known to bind and repress certain classes of repeats. We used the UCSC Repeat Browser in combination with these datasets, as well as RepeatMasker annotations in several non-human primates, to trace the independent trajectories of species-specific evolutionary battles between LINE 1 retroelements and their repressors. Furthermore, we document at https://repeatbrowser.ucsc.edu how researchers can map their own human genome annotations to these reference repeat sequences. Conclusions The UCSC Repeat Browser allows easy and intuitive visualization of genomic data on consensus repeat elements, circumventing the problem of multi-mapping, in which sequencing reads of repeat elements map to multiple locations on the human genome. By developing a reference consensus, multiple datasets and annotation tracks can easily be overlaid to reveal complex evolutionary histories of repeats in a single interactive window. Specifically, we use this approach to retrace the history of several primate specific LINE-1 families across apes, and discover several species-specific routes of evolution that correlate with the emergence and binding of KZNFs.
Collapse
|
5
|
Tørresen OK, Star B, Mier P, Andrade-Navarro MA, Bateman A, Jarnot P, Gruca A, Grynberg M, Kajava AV, Promponas VJ, Anisimova M, Jakobsen KS, Linke D. Tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases. Nucleic Acids Res 2019; 47:10994-11006. [PMID: 31584084 PMCID: PMC6868369 DOI: 10.1093/nar/gkz841] [Citation(s) in RCA: 155] [Impact Index Per Article: 31.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2019] [Revised: 09/03/2019] [Accepted: 10/01/2019] [Indexed: 12/13/2022] Open
Abstract
The widespread occurrence of repetitive stretches of DNA in genomes of organisms across the tree of life imposes fundamental challenges for sequencing, genome assembly, and automated annotation of genes and proteins. This multi-level problem can lead to errors in genome and protein databases that are often not recognized or acknowledged. As a consequence, end users working with sequences with repetitive regions are faced with 'ready-to-use' deposited data whose trustworthiness is difficult to determine, let alone to quantify. Here, we provide a review of the problems associated with tandem repeat sequences that originate from different stages during the sequencing-assembly-annotation-deposition workflow, and that may proliferate in public database repositories affecting all downstream analyses. As a case study, we provide examples of the Atlantic cod genome, whose sequencing and assembly were hindered by a particularly high prevalence of tandem repeats. We complement this case study with examples from other species, where mis-annotations and sequencing errors have propagated into protein databases. With this review, we aim to raise the awareness level within the community of database users, and alert scientists working in the underlying workflow of database creation that the data they omit or improperly assemble may well contain important biological information valuable to others.
Collapse
Affiliation(s)
- Ole K Tørresen
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, NO-0316 Oslo, Norway
| | - Bastiaan Star
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, NO-0316 Oslo, Norway
| | - Pablo Mier
- Faculty of Biology, Johannes Gutenberg University Mainz, Hans-Dieter-Husch-Weg 15, 55128 Mainz, Germany
| | - Miguel A Andrade-Navarro
- Faculty of Biology, Johannes Gutenberg University Mainz, Hans-Dieter-Husch-Weg 15, 55128 Mainz, Germany
| | - Alex Bateman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton. CB10 1SD, UK
| | - Patryk Jarnot
- Institute of Informatics, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, Poland
| | - Aleksandra Gruca
- Institute of Informatics, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, Poland
| | - Marcin Grynberg
- Institute of Biochemistry and Biophysics PAS, Pawińskiego 5A, 02-106 Warsaw, Poland
| | - Andrey V Kajava
- Centre de Recherche en Biologie cellulaire de Montpellier, UMR 5237 CNRS, Universite Montpellier 1919 Route de Mende, CEDEX 5, 34293 Montpellier, France
- Institut de Biologie Computationnelle, 34095 Montpellier, France
| | - Vasilis J Promponas
- Bioinformatics Research Laboratory, Department of Biological Sciences, University of Cyprus, PO Box 20537, CY 1678 Nicosia, Cyprus
| | - Maria Anisimova
- Institute of Applied Simulations, School of Life Sciences and Facility Management, Zurich University of Applied Sciences (ZHAW), Wädenswil, Switzerland
- Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Kjetill S Jakobsen
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, NO-0316 Oslo, Norway
| | - Dirk Linke
- Section for Genetics and Evolutionary Biology, Department of Biosciences, University of Oslo, NO-0316 Oslo, Norway
| |
Collapse
|
6
|
Rose AM, Krishan A, Chakarova CF, Moya L, Chambers SK, Hollands M, Illingworth JC, Williams SMG, McCabe HE, Shah AZ, Palmer CNA, Chakravarti A, Berg JN, Batra J, Bhattacharya SS. MSR1 repeats modulate gene expression and affect risk of breast and prostate cancer. Ann Oncol 2019; 29:1292-1303. [PMID: 29509840 DOI: 10.1093/annonc/mdy082] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Background MSR1 repeats are a 36-38 bp minisatellite element that have recently been implicated in the regulation of gene expression, through copy number variation (CNV). Patients and methods Bioinformatic and experimental methods were used to assess the distribution of MSR1 across the genome, evaluate the regulatory potential of such elements and explore the role of MSR1 elements in cancer, particularly non-familial breast cancer and prostate cancer. Results MSR1s are predominately located at chromosome 19 and are functionally enriched in regulatory regions of the genome, particularly regions implicated in short-range regulatory activities (H3K27ac, H3K4me1 and H3K4me3). MSR1-regulated genes were found to have specific molecular roles, such as serine-protease activity (P = 4.80 × 10-7) and ion channel activity (P = 2.7 × 10-4). The kallikrein locus was found to contain a large number of MSR1 clusters, and at least six of these showed CNV. An MSR1 cluster was identified within KLK14, with 9 and 11 copies being normal variants. A significant association with the 9-copy allele and non-familial breast cancer was found in two independent populations (P = 0.004; P = 0.03). In the white British population, the minor allele conferred an increased risk of 1.21-3.51 times for all non-familial disease, or 1.7-5.3 times in early-onset disease. The 9-copy allele was also found to be associated with increased risk of prostate cancer in an independent population (odds ratio = 1.27-1.56; P =0.009). Conclusions MSR1 repeats act as molecular switches that modulate gene expression. It is likely that CNV of MSR1 will affect risk of development of various forms of cancer, including that of breast and prostate. The MSR1 cluster at KLK14 represents the strongest risk factor identified to date in non-familial breast cancer and a significant risk factor for prostate cancer. Analysis of MSR1 genotype will allow development of precise stratification of disease risk and provide a novel target for therapeutic agents.
Collapse
Affiliation(s)
- A M Rose
- Department of Genetics, UCL Institute of Ophthalmology, University College London, London, UK.
| | - A Krishan
- Cell Therapy and Regenerative Medicine, CABIMER, Seville, Spain
| | - C F Chakarova
- Department of Genetics, UCL Institute of Ophthalmology, University College London, London, UK
| | - L Moya
- Australian Prostate Cancer Research Centre - Queensland, Translational Research Institute, Brisbane; Cancer Program, School of Biomedical Sciences, Institute of Health and Biomedical Innovation, Queensland University of Technology, Brisbane
| | - S K Chambers
- Menzies Health Institute Queensland, Griffith University, Southport; Cancer Research Centre, Cancer Council Queensland, Brisbane, Australia
| | - M Hollands
- UCL Medical School, University College London, London
| | | | | | - H E McCabe
- Clinical Genetics, Ninewells Hospital & Medical School, University of Dundee, Dundee
| | - A Z Shah
- Department of Genetics, UCL Institute of Ophthalmology, University College London, London, UK
| | - C N A Palmer
- Centre for Pharmacogenetics and Pharmacogenomics, Ninewells Hospital and School of Medicine, University of Dundee, Dundee, UK
| | - A Chakravarti
- Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, USA
| | - J N Berg
- Clinical Genetics, Ninewells Hospital & Medical School, University of Dundee, Dundee
| | - J Batra
- Australian Prostate Cancer Research Centre - Queensland, Translational Research Institute, Brisbane; Cancer Program, School of Biomedical Sciences, Institute of Health and Biomedical Innovation, Queensland University of Technology, Brisbane
| | - S S Bhattacharya
- Department of Genetics, UCL Institute of Ophthalmology, University College London, London, UK; Cell Therapy and Regenerative Medicine, CABIMER, Seville, Spain
| |
Collapse
|
7
|
Rose AM. Cancer and the junkyard chromosome: how repeat DNA sequence on chromosome 19 influences risk of malignant disease. Oncotarget 2018; 9:31942-31944. [PMID: 30174787 PMCID: PMC6112826 DOI: 10.18632/oncotarget.25873] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2018] [Accepted: 07/23/2018] [Indexed: 01/03/2023] Open
Affiliation(s)
- Anna M Rose
- MRC Weatherall Institute of Molecular Medicine, John Radcliffe Hospital, Oxford, UK
| |
Collapse
|
8
|
Kojima KK. Human transposable elements in Repbase: genomic footprints from fish to humans. Mob DNA 2018; 9:2. [PMID: 29308093 PMCID: PMC5753468 DOI: 10.1186/s13100-017-0107-y] [Citation(s) in RCA: 53] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2017] [Accepted: 12/20/2017] [Indexed: 01/21/2023] Open
Abstract
Repbase is a comprehensive database of eukaryotic transposable elements (TEs) and repeat sequences, containing over 1300 human repeat sequences. Recent analyses of these repeat sequences have accumulated evidences for their contribution to human evolution through becoming functional elements, such as protein-coding regions or binding sites of transcriptional regulators. However, resolving the origins of repeat sequences is a challenge, due to their age, divergence, and degradation. Ancient repeats have been continuously classified as TEs by finding similar TEs from other organisms. Here, the most comprehensive picture of human repeat sequences is presented. The human genome contains traces of 10 clades (L1, CR1, L2, Crack, RTE, RTEX, R4, Vingi, Tx1 and Penelope) of non-long terminal repeat (non-LTR) retrotransposons (long interspersed elements, LINEs), 3 types (SINE1/7SL, SINE2/tRNA, and SINE3/5S) of short interspersed elements (SINEs), 1 composite retrotransposon (SVA) family, 5 classes (ERV1, ERV2, ERV3, Gypsy and DIRS) of LTR retrotransposons, and 12 superfamilies (Crypton, Ginger1, Harbinger, hAT, Helitron, Kolobok, Mariner, Merlin, MuDR, P, piggyBac and Transib) of DNA transposons. These TE footprints demonstrate an evolutionary continuum of the human genome.
Collapse
Affiliation(s)
- Kenji K Kojima
- Genetic Information Research Institute, 465 Fairchild Drive, Suite 201, Mountain View, CA 94043 USA.,Department of Life Sciences, National Cheng Kung University, No. 1, Daxue Rd, East District, Tainan, 701 Taiwan
| |
Collapse
|
9
|
Bao W, Kojima KK, Kohany O. Repbase Update, a database of repetitive elements in eukaryotic genomes. Mob DNA 2015; 6:11. [PMID: 26045719 PMCID: PMC4455052 DOI: 10.1186/s13100-015-0041-9] [Citation(s) in RCA: 1639] [Impact Index Per Article: 182.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2015] [Accepted: 04/17/2015] [Indexed: 02/08/2023] Open
Abstract
Repbase Update (RU) is a database of representative repeat sequences in eukaryotic genomes. Since its first development as a database of human repetitive sequences in 1992, RU has been serving as a well-curated reference database fundamental for almost all eukaryotic genome sequence analyses. Here, we introduce recent updates of RU, focusing on technical issues concerning the submission and updating of Repbase entries and will give short examples of using RU data. RU sincerely invites a broader submission of repeat sequences from the research community.
Collapse
Affiliation(s)
- Weidong Bao
- Genetic Information Research Institute, 5150 El Camino Real, Ste B-30, Los Altos, CA 94022 USA
| | - Kenji K Kojima
- Genetic Information Research Institute, 5150 El Camino Real, Ste B-30, Los Altos, CA 94022 USA ; Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, University of Tokyo, Minato-ku, Tokyo Japan ; Institute of Medical Science, University of Tokyo, 4-6-1 Shirokanedai Minato-ku, Tokyo, 108-8639 Japan
| | - Oleksiy Kohany
- Genetic Information Research Institute, 5150 El Camino Real, Ste B-30, Los Altos, CA 94022 USA
| |
Collapse
|
10
|
A shotgun approach to discovering and reconstructing consensus retrotransposons ex novo from dense contigs of short sequences derived from Genbank Genome Survey Sequence database records. Gene 2009; 448:168-73. [DOI: 10.1016/j.gene.2009.06.011] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2009] [Revised: 06/12/2009] [Accepted: 06/19/2009] [Indexed: 01/19/2023]
|
11
|
Abstract
Eukaryotic genomes are full of repetitive DNA, transposable elements (TEs) in particular, and accordingly there are a number of computational methods that can be used to identify TEs from genomic sequences. We present here a survey of two of the most readily available and widely used bioinformatics applications for the detection, characterization, and analysis of TE sequences in eukaryotic genomes: CENSOR and RepeatMasker. For each program, information on availability, input, output, and the algorithmic methods used is provided. Specific examples of the use of CENSOR and RepeatMasker are also described. CENSOR and RepeatMasker both rely on homology-based methods for the detection of TE sequences. There are several other classes of methods available for the analysis of repetitive DNA sequences including de novo methods that compare genomic sequences against themselves, class-specific methods that use structural characteristics of specific classes of elements to aid in their identification, and pipeline methods that combine aspects of some or all of the aforementioned methods. We briefly consider the strengths and weaknesses of these different classes of methods with an emphasis on their complementary utility for the analysis of repetitive DNA in eukaryotes.
Collapse
|
12
|
Hassan MI, Waheed A, Yadav S, Singh TP, Ahmad F. Zinc alpha 2-glycoprotein: a multidisciplinary protein. Mol Cancer Res 2008; 6:892-906. [PMID: 18567794 DOI: 10.1158/1541-7786.mcr-07-2195] [Citation(s) in RCA: 171] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Zinc alpha 2-glycoprotein (ZAG) is a protein of interest because of its ability to play many important functions in the human body, including fertilization and lipid mobilization. After the discovery of this molecule, during the last 5 decades, various studies have been documented on its structure and functions, but still, it is considered as a protein with an unknown function. Its expression is regulated by glucocorticoids. Due to its high sequence homology with lipid-mobilizing factor and high expression in cancer cachexia, it is considered as a novel adipokine. On the other hand, structural organization and fold is similar to MHC class I antigen-presenting molecule; hence, ZAG may have a role in the expression of the immune response. The function of ZAG under physiologic and cancerous conditions remains mysterious but is considered as a tumor biomarker for various carcinomas. There are several unrelated functions that are attributed to ZAG, such as RNase activity, regulation of melanin production, hindering tumor proliferation, and transport of nephritic by-products. This article deals with the discussion of the major aspects of ZAG from its gene structure to function and metabolism.
Collapse
Affiliation(s)
- Md Imtaiyaz Hassan
- Centre for Interdisciplinary Research in Basic Sciences, Jamia Millia Islamia, New Delhi 110025, India
| | | | | | | | | |
Collapse
|
13
|
Uberbacher EC, Hyatt D, Shah M. GrailEXP and Genome Analysis Pipeline for genome annotation. ACTA ACUST UNITED AC 2008; Chapter 6:Unit 6.5. [PMID: 18428363 DOI: 10.1002/0471142905.hg0605s39] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The Gene Recognition and Analysis Internet Link (GRAIL) is one of the most widely used systems for evaluating the protein-coding potential of anonymous DNA sequences. This unit describes the use of the XGRAIL and genQuest client-server applications to locate exons in DNA sequences, to develop gene models, and to search databases for homologs. A support protocol describes how to obtain the GRAIL and genQuest client software by anonymous FTP.
Collapse
|
14
|
Kapitonov VV, Jurka J. A universal classification of eukaryotic transposable elements implemented in Repbase. Nat Rev Genet 2008; 9:411-2; author reply 414. [PMID: 18421312 DOI: 10.1038/nrg2165-c1] [Citation(s) in RCA: 317] [Impact Index Per Article: 19.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
|
15
|
Abstract
The gene identification problem is the problem of interpreting nucleotide sequences by computer, in order to provide tentative annotation on the location, structure, and functional class of protein-coding genes. This problem is of self-evident importance, and is far from being fully solved, particularly for higher eukaryotes. Thus it is not surprising that the number of algorithm and software developers working in the area is rapidly increasing. The present paper is an overview of the field, with an emphasis on eukaryotes, for such developers.
Collapse
Affiliation(s)
- J W Fickett
- Theoretical Biology and Biophysics Group, MS K710, Los Alamos National Laboratory, Los Alamos, NM 87545, USA
| |
Collapse
|
16
|
Jurka J, Kapitonov VV, Pavlicek A, Klonowski P, Kohany O, Walichiewicz J. Repbase Update, a database of eukaryotic repetitive elements. Cytogenet Genome Res 2005; 110:462-7. [PMID: 16093699 DOI: 10.1159/000084979] [Citation(s) in RCA: 2310] [Impact Index Per Article: 121.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2003] [Accepted: 04/06/2004] [Indexed: 12/13/2022] Open
Abstract
Repbase Update is a comprehensive database of repetitive elements from diverse eukaryotic organisms. Currently, it contains over 3600 annotated sequences representing different families and subfamilies of repeats, many of which are unreported anywhere else. Each sequence is accompanied by a short description and references to the original contributors. Repbase Update includes Repbase Reports, an electronic journal publishing newly discovered transposable elements, and the Transposon Pub, a web-based browser of selected chromosomal maps of transposable elements. Sequences from Repbase Update are used to screen and annotate repetitive elements using programs such as Censor and RepeatMasker. Repbase Update is available on the worldwide web at http://www.girinst.org/Repbase_Update.html.
Collapse
Affiliation(s)
- J Jurka
- Genetic Information Research Institute, Mountain View, CA 94043, USA.
| | | | | | | | | | | |
Collapse
|
17
|
Dunlop TW, Väisänen S, Frank C, Molnár F, Sinkkonen L, Carlberg C. The Human Peroxisome Proliferator-activated Receptor δ Gene is a Primary Target of 1α,25-Dihydroxyvitamin D3 and its Nuclear Receptor. J Mol Biol 2005; 349:248-60. [PMID: 15890193 DOI: 10.1016/j.jmb.2005.03.060] [Citation(s) in RCA: 146] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2005] [Revised: 03/15/2005] [Accepted: 03/21/2005] [Indexed: 02/07/2023]
Abstract
Peroxisome proliferator-activated receptor (PPAR) delta is the most widely expressed member of the PPAR family of nuclear receptor fatty acid sensors. Real-time PCR analysis of breast and prostate cancer cell lines demonstrated that PPARdelta expression was increased 1.5 to 3.2-fold after three hours stimulation with the natural vitamin D receptor (VDR) agonist, 1alpha,25-dihydroxyvitamin D3 (1alpha,25(OH)2D3). In silico analysis of the 20 kb of the human PPARdelta promoter revealed a DR3-type 1alpha,25(OH)2D3 response element approximately 350 bp upstream of the transcription start site, which was able to bind VDR-retinoid X receptor (RXR) heterodimers and mediate a 1alpha,25(OH)2D3-dependent upregulation of reporter gene activity. Chromatin immuno-precipitation assays demonstrated that a number of proteins representative for 1alpha,25(OH)2D3-mediated gene activation, such as VDR, RXR and RNA polymerase II, displayed a 1alpha,25(OH)2D3-dependent association with a region of the proximal PPARdelta promoter that contained the putative DR3-type VDRE. This was also true for other proteins that are involved in or are the subject of chromatin modification, such as the histone acetyltransferase CBP and histone 4, which displayed ligand-dependent association and acetylation, respectively. Finally, real-time PCR analysis demonstrated that 1alpha,25(OH)2D3 and the synthetic PPARdelta ligand L783483 show a cell and time-dependent interference in each other's effects on VDR mRNA expression, so that their combined application shows complex effects on the induction of VDR target genes, such as CYP24. Taken together, we conclude that PPARdelta is a primary 1alpha,25(OH)2D3-responding gene and that VDR and PPARdelta signaling pathways are interconnected at the level of cross-regulation of their respective transcription factor mRNA levels.
Collapse
Affiliation(s)
- Thomas W Dunlop
- Department of Biochemistry, University of Kuopio, FIN-70211 Kuopio, Finland
| | | | | | | | | | | |
Collapse
|
18
|
Milanesi L, Rogozin IB. ESTMAP: a system for expressed sequence tags mapping on genomic sequences. IEEE Trans Nanobioscience 2004; 2:75-8. [PMID: 15382662 DOI: 10.1109/tnb.2003.813928] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
The completion of a number of large genome sequencing projects emphasizes the importance of protein-coding gene predictions. Most of the problems associated with gene prediction are caused by the complex exon-intron structures commonly found in eukaryotic genomes. However, information from homologous sequences can significantly improve the accuracy of the prediction. In particular, expressed sequence tags (ESTs) are very useful for this purpose, since currently existing EST collections are very large. We developed an ESTMAP system, which utilizes homology searches against a database of repetitive elements using the RepeatView program and the EST Division of GenBank using the BLASTN program. ESTMAP extracts "exact" matches with EST sequences (> 95% of homology) from BLASTN output file and predicts introns in DNA comparing ESTs and a query sequence. ESTMAP is implemented as a part of the WebGene system (http://www.cnr.it/webgene).
Collapse
|
19
|
Campagna D, Romualdi C, Vitulo N, Del Favero M, Lexa M, Cannata N, Valle G. RAP: a new computer program for de novo identification of repeated sequences in whole genomes. Bioinformatics 2004; 21:582-8. [PMID: 15374857 DOI: 10.1093/bioinformatics/bti039] [Citation(s) in RCA: 34] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION DNA repeats are a common feature of most genomic sequences. Their de novo identification is still difficult despite being a crucial step in genomic analysis and oligonucleotides design. Several efficient algorithms based on word counting are available, but too short words decrease specificity while long words decrease sensitivity, particularly in degenerated repeats. RESULTS The Repeat Analysis Program (RAP) is based on a new word-counting algorithm optimized for high resolution repeat identification using gapped words. Many different overlapping gapped words can be counted at the same genomic position, thus producing a better signal than the single ungapped word. This results in better specificity both in terms of low-frequency detection, being able to identify sequences repeated only once, and highly divergent detection, producing a generally high score in most intron sequences. AVAILABILITY The program is freely available for non-profit organizations, upon request to the authors. CONTACT giorgio.valle@unipd.it SUPPLEMENTARY INFORMATION The program has been tested on the Caenorhabditis elegans genome using word lengths of 12, 14 and 16 bases. The full analysis has been implemented in the UCSC Genome Browser and is accessible at http://genome.cribi.unipd.it.
Collapse
Affiliation(s)
- Davide Campagna
- CRIBI, Università degli Studi di Padova via Ugo Bassi 58b, I-35121 Padova, Italy
| | | | | | | | | | | | | |
Collapse
|
20
|
Uberbacher EC, Hyatt D, Shah M. GrailEXP and Genome Analysis Pipeline for genome annotation. CURRENT PROTOCOLS IN BIOINFORMATICS 2004; Chapter 4:Unit4.9. [PMID: 18428726 DOI: 10.1002/0471250953.bi0409s04] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
The Basic Protocol describes the use of GrailEXP, the latest version of the gene finding system from Oak Ridge National Laboratory. GrailEXP provides gene models, by making use of sequence similarity with Expressed Sequence Tags (ESTs) and known genes. GrailEXP also provides alternatively spliced constructs for each gene based on the available EST evidence. The Support Protocol describes the use of the Genome Analysis Pipeline, a web application which allows users to perform comprehensive sequence analysis by offering a selection from a wide choice of supported gene finders, other biological feature finders, and database searches.
Collapse
|
21
|
Ladunga I. Finding Homologs to Nucleotide Sequences Using Network
BLAST
Searches. ACTA ACUST UNITED AC 2002; Chapter 3:Unit 3.3. [DOI: 10.1002/0471250953.bi0303s00] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Affiliation(s)
- Istvan Ladunga
- Celera Genomics Foster City California
- Research Group for Evolutionary Genetics, Hungarian Academy of Sciences, Eötvös University Budapest Hungary
| |
Collapse
|
22
|
Kaminski JM, Huber MR, Summers JB, Ward MB. Design of a nonviral vector for site-selective, efficient integration into the human genome. FASEB J 2002; 16:1242-7. [PMID: 12153992 DOI: 10.1096/fj.02-0127hyp] [Citation(s) in RCA: 40] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Gene therapy in eukaryotes has met many obstacles. Research into the design of suitable nonviral vectors has been slow. To our knowledge, no nonviral vector has been proposed that allows for the possibility of highly efficient, site-selective integration into the genome of mammalian cells. On the basis of prior studies investigating the components necessary for transposon, retrovirus-like retrotransposon, and retroviral integration, we propose a nonviral system that would potentially allow for site-selective, efficient integration into the mammalian genome. Transposons have been developed that can transform a variety of cell lines. For example, the Sleeping Beauty transposon (SB) can transform a wide range of vertebrate cells from fish to human, and it mediates stable integration and long-term transgene expression in mice. However, the efficiency of transposition varies significantly among cell lines, suggesting the possible involvement of host factors in SB transposition. Here, we propose the use of a chimeric transposase (i.e., transposase-host DNA binding domain) to bypass the potential requirement of a host DNA-directing factor (or factors) for efficient, site-selective integration. We also discuss another potential method of docking the transposon-based vector adjacent to the host DNA, utilizing repetitive sequences for homologous recombination to promote efficient site-selective integration, as well as other site-selective nonviral approaches.
Collapse
Affiliation(s)
- Joseph M Kaminski
- Department of Radiation Oncology, Fox Chase Cancer Center, Philadelphia, Pennsylvania 19111, USA.
| | | | | | | |
Collapse
|
23
|
Scofield MA, Xiong W, Haas MJ, Zeng Y, Cox GS. Sequence analysis of the human glycoprotein hormone alpha-subunit gene 5'-flanking DNA and identification of a potential regulatory element as an alu repetitive sequence. BIOCHIMICA ET BIOPHYSICA ACTA 2000; 1493:302-18. [PMID: 11018255 DOI: 10.1016/s0167-4781(00)00192-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
Abstract
The nucleotide sequence of the human glycoprotein hormone alpha-subunit (GPHalpha) gene 5'-flanking DNA was determined from -1637 to +49 relative to the cap site (+1). Comparison of the upstream sequence of the human gene with those of rhesus and mouse demonstrates regions with variable identity. When the 1.7 kb fragment was used to drive the expression of chloramphenicol acetyltransferase (CAT) in transiently transfected HeLa cells, it was found that CAT activity was elevated about 3-fold when the fragment was truncated from -1637 to -846, suggesting the presence of a negative regulatory element in the distal 5'-flanking DNA. This overlaps an Alu repetitive sequence (ARS) located between nucleotides -1330 and -1007. Gel mobility shift and DNase protection analyses identified a protein binding site centered around -1100 in the ARS second monomer. The GPHalpha upstream ARS was cloned in both orientations in positions upstream and downstream from the bacterial CAT gene under control of the herpes simplex virus thymidine kinase (tk) promoter. DNA-mediated transient transfection of these plasmids revealed a marked inhibition (79-82%) of CAT production by the ARS when it was cloned upstream from the tk promoter and in the same orientation as that found in the GPHalpha 5'-flanking DNA. Smaller decreases (29-57%) were produced by the ARS cloned upstream from the tk promoter in the reverse orientation. In marked contrast, the Alu repetitive element had little or no effect when cloned in either orientation downstream from the tk-CAT gene. Introduction of a second ARS downstream from the CAT reporter gene in vectors already containing an ARS upstream from the tk promoter significantly reduced the strong negative effect elicited by the upstream repetitive element. When compared to the Blur 8 Alu element, the GPHalpha upstream ARS differs markedly with respect to its effect on tk-CAT expression in transient assays and as a substrate for DNA binding proteins present in HeLa nuclear extracts. Together, the transient expression results demonstrate that ARS elements can influence expression of nearby class II promoters. The extent of this effect depends on element position and orientation, cell type, the particular ARS (e.g., GPHalpha or Blur 8), and whether copies were present both upstream and downstream from the transcription unit.
Collapse
Affiliation(s)
- M A Scofield
- Department of Biochemistry and Molecular Biology, University of Nebraska Medical Center, 984525 Nebraska Medical Center, Omaha, NE 68198-4525, USA
| | | | | | | | | |
Collapse
|
24
|
Mikulska JE, Pablo L, Canel J, Simister NE. Cloning and analysis of the gene encoding the human neonatal Fc receptor. EUROPEAN JOURNAL OF IMMUNOGENETICS : OFFICIAL JOURNAL OF THE BRITISH SOCIETY FOR HISTOCOMPATIBILITY AND IMMUNOGENETICS 2000; 27:231-40. [PMID: 10998088 DOI: 10.1046/j.1365-2370.2000.00225.x] [Citation(s) in RCA: 22] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
The neonatal Fc receptor, FcRn, is expressed in human placental syncytiotrophoblast, capillary endothelium, intestinal epithelium, and other tissues. By analogy with its role in the mouse, human FcRn is expected to transport maternal IgG to the foetus, and protect circulating IgG from catabolism. The larger subunit of FcRn is homologous to the alpha chains of the major histocompatibility complex (MHC) class I proteins, but is encoded outside the MHC on chromosome 19. We report the isolation of clones encoding the alpha chain of human FcRn from chromosome 19-specific libraries. The sequence revealed a similar organization to classical and non-classical MHC, and MHC-related genes. Compared with classical MHC class I genes, the human FcRn alpha chain gene has expanded by acquiring many repetitive sequences in its introns, including multiple Alu elements in the fourth intron. Primer extension analysis showed that there are two transcription initiation sites in the upstream flanking sequence.
Collapse
Affiliation(s)
- J E Mikulska
- Ludwik Hirszfeld Institute of Immunology and Experimental Therapy, Polish Academy of Sciences, Wroclaw, Poland
| | | | | | | |
Collapse
|
25
|
Park HS, Nogami M, Okumura K, Hattori M, Sakaki Y, Fujiyama A. Newly identified repeat sequences, derived from human chromosome 21qter, are also localized in the subtelomeric region of particular chromosomes and 2q13, and are conserved in the chimpanzee genome. FEBS Lett 2000; 475:167-9. [PMID: 10869549 DOI: 10.1016/s0014-5793(00)01632-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Subtelomeric regions have been a target of structural and functional studies of human chromosomes. Markers having a defined structure are especially useful to such studies. Here, we report 93 bp tandem repeat sequences found in the subtelomeric region of human chromosome 21q. They were also detected in the telomeric region of several other chromosomes. Interestingly, the repeat was also found in the 2q13 region which is known to be a position of chromosomal fusion, a major difference between the human and chimpanzee karyotypes. To the best of our knowledge, this repetitive sequence is a new member of human subtelomeric interspersed repeats.
Collapse
Affiliation(s)
- H S Park
- RIKEN Genomic Sciences Center, c/o Kitasato University, Sagamihara, Kanagawa 228-8555, Japan
| | | | | | | | | | | |
Collapse
|
26
|
Brand-Arpon V, Rouquier S, Massa H, de Jong PJ, Ferraz C, Ioannou PA, Demaille JG, Trask BJ, Giorgi D. A genomic region encompassing a cluster of olfactory receptor genes and a myosin light chain kinase (MYLK) gene is duplicated on human chromosome regions 3q13-q21 and 3p13. Genomics 1999; 56:98-110. [PMID: 10036190 DOI: 10.1006/geno.1998.5690] [Citation(s) in RCA: 47] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
The olfactory receptor (OR) multigene family is widely distributed in the human genome. We characterize here a new cluster of four OR genes (HGMW-approved symbols OR7E20P, OR7E6P, OR7E21P, and OR7E22P) on human chromosome 3p13 that is contained in an approximately 250-kb region. This region has been physically mapped, and a 106-kb portion containing the OR genes has been sequenced. All the OR sequences are disrupted by frameshifts and stop codons and appear to have arisen through local duplications. A myosin light chain kinase pseudogene (HGMW-approved symbol MYLKP) lies at one end of the OR gene cluster. Sequences spanning the entire region are also present at 3q13-q21, the site of the functional MYLK gene. This region duplicated locally before the divergence of primates, and the two paralogous copies were later separated to sites on either side of the centromere. This study increases our understanding of the evolution of the human genome. The 3p13 cluster is the first example of a tandem array of OR pseudogenes, and duplications of such clusters may account for the accumulation of a large number of pseudogenes in the human genome.
Collapse
Affiliation(s)
- V Brand-Arpon
- IGH, CNRS UPR 1142, 141 rue de la Cardonille, Montpellier Cédex 5, 34396, France
| | | | | | | | | | | | | | | | | |
Collapse
|
27
|
McPartlan HC, Matthews ME, Primmer C, McCauley L, Thompson C, Robinson NA. A dog microsatellite at the VIAS-D21 locus with demonstrated linkage to the marker CXX20. Anim Genet 1999; 30:75. [PMID: 10050303 DOI: 10.1046/j.1365-2052.1999.00323-14.x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Affiliation(s)
- H C McPartlan
- Molecular Genetics Unit, Victorian Institute of Animal Science, Victoria, Australia
| | | | | | | | | | | |
Collapse
|
28
|
van der Reijden BA, Dauwerse HG, Giles RH, Jagmohan-Changur S, Wijmenga C, Liu PP, Smit B, Wessels HW, Beverstock GC, Jotterand-Bellomo M, Martinet D, Mühlematter D, Lafage-Pochitaloff M, Gabert J, Reiffers J, Bilhou-Nabera C, van Ommen GJ, Hagemeijer A, Breuning MH. Genomic acute myeloid leukemia-associated inv(16)(p13q22) breakpoints are tightly clustered. Oncogene 1999; 18:543-50. [PMID: 9927211 DOI: 10.1038/sj.onc.1202321] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The inv(16) and related t(16;16) are found in 10% of all cases with de novo acute myeloid leukemia. In these rearrangements the core binding factor beta (CBFB) gene on 16q22 is fused to the smooth muscle myosin heavy chain gene (MYH11) on 16p13. To gain insight into the mechanisms causing the inv(16) we have analysed 24 genomic CBFB-MYH11 breakpoints. All breakpoints in CBFB are located in a 15-Kb intron. More than 50% of the sequenced 6.2 Kb of this intron consists of human repetitive elements. Twenty-one of the 24 breakpoints in MYH11 are located in a 370-bp intron. The remaining three breakpoints in MYH11 are located more upstream. The localization of three breakpoints adjacent to a V(D)J recombinase signal sequence in MYH11 suggests a V(D)J recombinase-mediated rearrangement in these cases. V(D)J recombinase-associated characteristics (small nucleotide deletions and insertions of random nucleotides) were detected in six other cases. CBFB and MYH11 duplications were detected in four of six cases tested.
Collapse
Affiliation(s)
- B A van der Reijden
- Department of Human Genetics, Leiden University, Sylvius Laboratories, Leiden, The Netherlands
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
29
|
Stephen Lasky LR, Hood L. Deciphering Genomes Through Automated Large-scale Sequencing. J Microbiol Methods 1999. [DOI: 10.1016/s0580-9517(08)70204-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/29/2023]
|
30
|
Jiang JC, Kirchman PA, Zagulski M, Hunt J, Jazwinski SM. Homologs of the yeast longevity gene LAG1 in Caenorhabditis elegans and human. Genome Res 1998; 8:1259-72. [PMID: 9872981 DOI: 10.1101/gr.8.12.1259] [Citation(s) in RCA: 111] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
LAG1 is a longevity gene, the first such gene to be identified and cloned from the yeast Saccharomyces cerevisiae. A close homolog of this gene, which we call LAC1, has been found in the yeast genome. We have cloned the human homolog of LAG1 with the ultimate goal of examining its possible function in human aging. In the process, we have also cloned a homolog from the nematode worm Caenorhabditis elegans. Both of these homologs, LAG1Hs and LAG1Ce-1, functionally complemented the lethality of a lag1delta lac1delta double deletion, despite low overall sequence similarity to the yeast proteins. The proteins shared a short sequence, the Lag1 motif, and a similar transmembrane domain profile. Another, more distant human homolog, TRAM, which lacks this motif, did not complement. LAG1Hs also restored the life span of the double deletion, demonstrating that it functions in establishing the longevity phenotype in yeast. LAG1Hs mapped to 19p12, and it was expressed in only three tissues: brain, skeletal muscle, and testis. This gene possesses a trinucleotide (CTG) repeat within exon 1. This and its expression profile raise the possibility that it may be involved in neurodegenerative disease. This possibility suggests at least one way in which LAG1Hs might be involved in human aging.
Collapse
Affiliation(s)
- J C Jiang
- Department of Biochemistry and Molecular Biology, Louisiana State University Medical Center, New Orleans, Lousiana 70112, USA
| | | | | | | | | |
Collapse
|
31
|
Elisaphenko EA, Nesterova TB, Duthie SM, Ruldugina OV, Rogozin IB, Brockdorff N, Zakian SM. Repetitive DNA sequences in the common vole: cloning, characterization and chromosome localization of two novel complex repeats MS3 and MS4 from the genome of the East European vole Microtus rossiaemeridionalis. Chromosome Res 1998; 6:351-60. [PMID: 9872664 DOI: 10.1023/a:1009284031287] [Citation(s) in RCA: 21] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
We have characterized two novel, complex, heterochromatic repeat sequences, MS3 and MS4, isolated from Microtus rossiaemeridionalis genomic DNA. Sequence analysis indicates that both repeats consist of unique sequences interrupted by repeat elements of different origin and can be classified as long complex repeat units (LCRUs). A unique feature of both repeat units is the presence of short interspersed repeat elements (SINEs), which are usually characteristic of the euchromatic part of the genome. Comparative analysis revealed no significant stretches of homology in the nucleotide sequences between the two repeats, suggesting that the repeats originated independently during the course of vole genome evolution. Fluorescence in situ hybridization analysis demonstrates that MS3 and MS4 occupy distinct domains in the heterochromatic regions of the sex chromosomes in M. transcaspicus and M. arvalis but collocalize in M. rossiaemeridionalis and M. kirgisorum heterochromatic blocks. The localization pattern of the repeats on the vole chromosomes confirms the independent origin of the two repeats and suggests that expansion of the heterochromatic blocks has occurred subsequent to speciation.
Collapse
Affiliation(s)
- E A Elisaphenko
- Institute of Cytology and Genetics, Russian Academy of Sciences, Novosibirsk
| | | | | | | | | | | | | |
Collapse
|
32
|
Haila S, Höglund P, Scherer SW, Lee JR, Kristo P, Coyle B, Trembath R, Holmberg C, de la Chapelle A, Kere J. Genomic structure of the human congenital chloride diarrhea (CLD) gene. Gene X 1998; 214:87-93. [PMID: 9729124 DOI: 10.1016/s0378-1119(98)00261-3] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Congenital chloride diarrhea (CLD) is caused by mutations in a gene which encodes an intestinal anion transporter. We report here the complete genomic organization of the human CLD gene which spans approximately 39kb, and comprises 21 exons. All exon/intron boundaries conform to the GT/AG rule. An analysis of the putative promoter region sequence shows a putative TATA box and predicts multiple transcription factor binding sites. The genomic structure was determined using DNA from several sources including multiple large-insert libaries and genomic DNA from Finnish CLD patients and controls. Exon-specific primers developed in this study will facilitate mutation screening studies of patients with the disease. Genomic sequencing of a BAC clone H_RG364P16 revealed the presence of another, highly homologous gene 3' of the CLD gene, with a similar genomic structure, recently identified as the Pendred syndrome gene (PDS).
Collapse
Affiliation(s)
- S Haila
- Department of Medical Genetics, Haartman Institute, University of Helsinki, 00014 Helsinki, Finland.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
33
|
Mohamed MK, Taylor RE, Feinstein DS, Huang X, Pittler SJ. Structure and upstream region characterization of the human gene encoding rod photoreceptor cGMP phosphodiesterase alpha-subunit. J Mol Neurosci 1998; 10:235-50. [PMID: 9770645 DOI: 10.1007/bf02761777] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
Rod photoreceptor cGMP phosphodiesterase (PDE6) is a three-subunit (a, b, g2) enzyme that functions to reduce intracellular cytoplasmic cGMP levels, an integral feature of the phototransduction cascade of vision. To allow assessment of the potential for defects in the gene encoding the alpha-subunit (PDE6A) to cause visual dysfunction, and to begin to dissect the basis for photoreceptor-specific expression of this gene, we have characterized the structural gene and upstream region. The human PDE6A gene consists of 22 exons spanning about 60 kb with the intron/exon junctions highly conserved in comparison to the mouse and human PDE6B genes. Using ribonuclease protection and primer extension assays, a predominant transcription start point (tsp) was identified 120 bp upstream of the initiator ATG. To begin functional analysis of the PDE6A promoter, approx 4 kb of sequence were determined upstream of the tsp. Comparison of this upstream sequence with an approximately 500 bp sequence upstream of the mouse Pde6a gene revealed five distinct segments of identity all within 100 bp upstream of the human PDE6A tsp. A TATA box adjacent to a photoreceptor-specific RET1-like binding site, an SP1 site, and two novel putative cis-element sequences were found. A consensus initiator element sequence is present at the tsp. Additionally, within a 2.5-kb segment beginning 900 bp upstream of the tsp two Alu, a MIR, an L1, and two MER repetitive elements were found. Electrophoretic mobility shift assays generate a retina-specific bandshift using a 322-bp fragment containing the putative promoter region or a multimer of the RET1-like site. DNA footprinting assays revealed footprints over the primary transcription startpoint and the RET1-like and TATA box regions. These results indicate that a 220-bp segment of the PDE6A gene upstream region is important for tissue-specific expression.
Collapse
Affiliation(s)
- M K Mohamed
- Department of Biochemistry & Molecular Biology, University of South Alabama College of Medicine, Mobile 36688-0002, USA
| | | | | | | | | |
Collapse
|
34
|
Biunno I, Rogozin IB, Appierto V, Milanesi L, Mostardini M, Mumm S, Pergolizzi R, Zucchi I, De Bellis G. Sequence and gene content in 35 kb genomic clone mapping in the human Xq27.1 region. DNA SEQUENCE : THE JOURNAL OF DNA SEQUENCING AND MAPPING 1998; 8:1-15. [PMID: 9522116 DOI: 10.3109/10425179709020880] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
This paper presents detailed analysis of the entire sequence of a cosmid clone, 26H7, containing 35 kb of human DNA. This cosmid resides on the q27.1 region of the human X chromosome between, DXS1232 and DXS119 loci. Novel potential small exons were detected for which conventional gene identification strategies (Northern blot analysis and extensive cDNA library screening) proved to be inefficient. Of the standard repetitive elements we found: 8 Alu's making up 6.2% of the sequence; 10 MIR segments (4.1%); 5 LINE1 elements (4.8%), 3 MIR2 (1.0%); 2 MLT (2.9%), and 1 MSTA (0.7%) representing about 20% of the total sequence. The overall GC content was rather low, only 42% and no CpG island was detected using rare restriction enzymes. However, a CpG-rich region was identified. Computer aided analysis of the sequence inferred the presence of three possible genes: one of them was found to be homologous to the U7 RNA family elements; a second is reported in this paper, however at the moment no significant homology has been found in the data bank. The third predicted gene has not as yet been found to be detectable by RT-PCR. We also report in this paper the identification of X-chromosome specific repeated sequences.
Collapse
Affiliation(s)
- I Biunno
- Consiglio Nazionale delle Ricerche, Istituto Tecnologie Biomediche Avanzate, Milano, Italy
| | | | | | | | | | | | | | | | | |
Collapse
|
35
|
Bailey LC, Searls DB, Overton GC. Analysis of EST-driven gene annotation in human genomic sequence. Genome Res 1998; 8:362-76. [PMID: 9548972 DOI: 10.1101/gr.8.4.362] [Citation(s) in RCA: 52] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
We have performed a systematic analysis of gene identification in genomic sequence by similarity search against expressed sequence tags (ESTs) to assess the suitability of this method for automated annotation of the human genome. A BLAST-based strategy was constructed to examine the potential of this approach, and was applied to test sets containing all human genomic sequences longer than 5 kb in public databases, plus 300 kb of exhaustively characterized benchmark sequence. At high stringency, 70%-90% of all annotated genes are detected by near-identity to EST sequence; >95% of ESTs aligning with well-annotated sequences overlap a gene. These ESTs provide immediate access to the corresponding cDNA clones for follow-up laboratory verification and subsequent biologic analysis. At lower stringency, up to 97% of annotated genes were identified by similarity to ESTs. The apparent false-positive rate rose to 55% of ESTs among all sequences and 20% among benchmark sequences at the lowest stringency, indicating that many genes in public database entries are unannotated. Approximately half of the alignments span multiple exons, and thus aid in the construction of gene predictions and elucidation of alternative splicing. In addition, ESTs from multiple cDNA libraries frequently cluster over genes, providing a starting point for crude expression profiles. Clone IDs may be used to form EST pairs, and particularly to extend models by associating alignments of lower stringency with high-quality alignments. These results demonstrate that EST similarity search is a practical general-purpose annotation technique that complements pattern recognition methods as a tool for gene characterization.
Collapse
Affiliation(s)
- L C Bailey
- Computational Biology and Informatics Laboratory, Department of Genetics, University of Pennsylvania School of Medicine, Philadelphia, Pennsylvania 19104, USA.
| | | | | |
Collapse
|
36
|
Abstract
Molecular genetic studies of the human major histocompatibility complex (MHC) have led to the identification of more than 200 genes. Besides the large number of genes in the MHC, densely clustered areas of retroelements have been identified. These include short and long interspersed elements (SINEs and LINEs), and human endogenous retroviruses (HERVs). The presence of retroelements in the MHC provides a clear example of how these elements affect the genome plasticity of the host. Comparative analyses of these retroelements have proven highly useful in evolutionary studies of the MHC. Recently, HERV-encoded superantigens have been implicated as candidate autoimmune genes in type I diabetes and multiple sclerosis. In addition, genetic analyses have revealed that autoimmune diseases show strong associations with MHC class II genes. The intriguing correlations between retroviral encoded antigens, MHC class II genes and the development of autoimmune disease merit intense future investigations of retroelements, in particular those endogenous retroviruses located in the MHC class II region proper.
Collapse
Affiliation(s)
- G Andersson
- Department of Cell Research, Uppsala Genetic Center, Swedish University of Agricultural Sciences, Sweden.
| | | | | | | |
Collapse
|
37
|
Bailey LC, Fischer S, Schug J, Crabtree J, Gibson M, Overton GC. GAIA: framework annotation of genomic sequence. Genome Res 1998; 8:234-50. [PMID: 9521927 DOI: 10.1101/gr.8.3.234] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
As increasing amounts of genomic sequence from many organisms become available, and as DNA sequences become a primary reagent in biologic investigations, the role of annotation as a prospective guide for laboratory experiments will expand rapidly. Here we describe a process of high-throughput, reliable annotation, called framework annotation, which is designed to provide a foundation for initial biologic characterization of previously unexamined sequence. To examine this concept in practice, we have constructed Genome Annotation and Information Analysis (GAIA), a prototype software architecture that implements several elements important for framework annotation. The center of GAIA consists of an annotation database and the associated data management subsystem that forms the software bus along which other components communicate. The schema for this database defines three principal concepts: (1) Entries, consisting of sequence and associated historical data; (2) Features, comprising information of biologic interest; and (3) Experiments, describing the evidence that supports Features. The database permits tracking of annotation results over time, as well as assessment of the reliability of particular results. New framework annotation is produced by CARTA, a set of autonomous sensors that perform automatic analyses and assert results into the annotation database. These results are available via a Web-based query interface that uses graphical Java applets as well as text-based HTML pages to display data at different levels of resolution and permit interactive exploration of annotation. We present results for initial application of framework annotation to a set of test sequences, demonstrating its effectiveness in providing a starting point for biologic investigation, and discuss ways in which the current prototype can be improved. The prototype is available for public use and comment at http://www.cbil.upenn.edu/gaia.
Collapse
Affiliation(s)
- L C Bailey
- Computational Biology and Informatics Laboratory, Department of Genetics, University of Pennsylvania School of Medicine, Philadelphia, Pennsylvania 19104-6021, USA.
| | | | | | | | | | | |
Collapse
|
38
|
Abstract
There are now five reported examples in which the 3' ends of tRNA-derived SINEs are derived from the 3' ends of LINEs. These examples include representative sequences from turtles, fish, mammals and plants (Ohshima et al., 1996, Mol. Cell. Biol., 16, 3756 3764; Okada and Hamada, 1997, J. Mol. Evol. 44, Suppl 1:S52-S56). In this review, we discuss the generality of this architecture of SINEs, adding new examples of pairs of SINEs and LINEs, which include one complete and two probable examples from this laboratory and one complete example from the laboratory of Arian Smit. This organization of SINEs and LINEs provides the basis for a simple general scheme by which SINEs might acquire retropositional activity.
Collapse
Affiliation(s)
- N Okada
- Faculty of Bioscience and Biotechnology, Tokyo Institute of Technology, Yokohama, Japan.
| | | | | | | |
Collapse
|
39
|
Chissoe SL, Marra MA, Hillier L, Brinkman R, Wilson RK, Waterston RH. Representation of cloned genomic sequences in two sequencing vectors: correlation of DNA sequence and subclone distribution. Nucleic Acids Res 1997; 25:2960-6. [PMID: 9224593 PMCID: PMC146865 DOI: 10.1093/nar/25.15.2960] [Citation(s) in RCA: 19] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open
Abstract
Representation of subcloned Caenorhabditis elegans and human DNA sequences in both M13 and pUC sequencing vectors was determined in the context of large scale genomic sequencing. In many cases, regions of subclone under-representation correlated with the occurrence of repeat sequences, and in some cases the under-representation was orientation specific. Factors which affected subclone representation included the nature and complexity of the repeat sequence, as well as the length of the repeat region. In some but not all cases, notable differences between the M13 and pUC subclone distributions existed. However, in all regions lacking one type of subclone (either M13 or pUC), an alternate subclone was identified in at least one orientation. This suggests that complementary use of M13 and pUC subclones would provide the most comprehensive subclone coverage of a given genomic sequence.
Collapse
Affiliation(s)
- S L Chissoe
- Department of Genetics and Genome Sequencing Center, Washington University School of Medicine, St Louis, MO 63108, USA.
| | | | | | | | | | | |
Collapse
|
40
|
Granadino B, Beltrán-Valero de Bernabé D, Fernández-Cañón JM, Peñalva MA, Rodríguez de Córdoba S. The human homogentisate 1,2-dioxygenase (HGO) gene. Genomics 1997; 43:115-22. [PMID: 9244427 DOI: 10.1006/geno.1997.4805] [Citation(s) in RCA: 59] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
Alkaptonuria (AKU; McKusick No. 203500), a rare hereditary disorder of the phenylalanine catabolism, was the first disease to be interpreted as an inborn error of metabolism (A. E. Garrod, 1902, Lancet 2: 1616-1620). AKU patients are deficient for homogentisate 1,2-dioxygenase (HGO; EC 1.13.11.5). This enzymatic deficiency causes homogentisic aciduria, ochronosis, and arthritis. Recently we cloned the human HGO gene and showed that AKU patients carry two copies of a loss-of-function HGO allele. Here we describe the complete nucleotide sequence of the human HGO gene and the identification of its promoter region. The human HGO gene spans 54,363 bp and codes for a 1715-nt-long transcript that is split into 14 exons ranging from 35 to 360 bp. The HGO introns, 605 to 17,687 bp in length, contain representatives of the major classes of repetitive elements, including several simple sequence repeats (SSR). Two of these SSRs, a (CT)n repeat in intron 4 and a (CA)n repeat in intron 13, were found to be polymorphic in a Spanish population sample. The HGO transcription start site was determined by primer extension. We report that sequences from -1074 to +89 bp (relative to the HGO transcription start site) are sufficient to promote transcription of a CAT reporter gene in human liver cells and that this fragment contains putative binding sites for liver-enriched transcription factors that might be involved in the regulation of HGO expression in liver.
Collapse
Affiliation(s)
- B Granadino
- Departamento de Immunología, Centro de Investigaciones Biológicas Consejo Superior de Investigaciones Científicas, Madrid, Spain
| | | | | | | | | |
Collapse
|
41
|
Adolph KW, Liska DJ, Bornstein P. Analysis of the promoter and transcription start sites of the human thrombospondin 2 gene (THBS2). Gene 1997; 193:5-11. [PMID: 9249061 DOI: 10.1016/s0378-1119(97)00070-x] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
To identify features of the human thrombospondin 2 gene (THBS2) important for regulation of expression, the sequences of 5 kb of the promoter/5' flank and 3 kb of transcribed and intronic DNA were determined. Two repetitive sequences were found: an MLT1c element located 2.2 kb 5' of exon 1 and, further 5', 1.8 kb of a Tigger1 element. Putative transcription factor binding sites that might be significant for THBS2 regulation included p53, NF-kappaB, Spl, Myc-CF1, NF-Y, CF1, AP1, and GATA sites. Alignment of the promoter/5' flank sequence with the mouse Thbs2 promoter revealed 78% identity for a 450 bp region immediately upstream from the mouse transcription start site. No significant homology was detected between the human thrombospondin 2 and thrombospondin 1 promoters. Comparison of the THBS2 genomic and cDNA sequences revealed that, in contrast to Thbs2, exon 1 is divided into exons 1A and 1B by a small (93 bp) intron. The transcription start site was investigated by a PCR procedure and by 5' RACE, and yielded a size for exon 1A of at least 186 bp. Tissue-specific differences in transcription start sites were found, with transcript lengths in the order: fetal lung > adult lung > fetal brain. These results suggest that tissue-specific differences in expression of the THBS2 gene may be determined, in part, by selection of the transcription start site and resulting differences in the 5' untranslated region.
Collapse
Affiliation(s)
- K W Adolph
- Department of Biochemistry, University of Minnesota Medical School, Minneapolis 55455, USA
| | | | | |
Collapse
|
42
|
Abstract
We have sequenced and analyzed 8.3 kb of sequence adjacent and distal to the human ribosomal DNA (rDNA); this distal sequence connects to the rDNA cluster just 4 kb upstream of the first promoter and is shared among the acrocentric chromosomes and, at least in part, it is also present in other primates. The sequence differs in character from that of the rDNA intergenic spacer (IGS) in that it does not contain long stretches of either polypyrimidine or polypurine. However, just like the IGS, it contains numerous repetitive elements, including retroposed fragments of 28S rRNA and large pieces of the IGS. In addition, we show that the rDNA clusters are not interrupted by other sequences and do not recombine with this distal segment.
Collapse
Affiliation(s)
- I L Gonzalez
- MCP Hahnemann School of Medicine, Allegheny University of the Health Sciences, Department of Pathology, Broad and Vine, Philadelphia, PA 19102, USA.
| | | |
Collapse
|
43
|
Bock JH, Shuck ME, Benjamin CW, Chee M, Bienkowski MJ, Slightom JL. Nucleotide sequence analysis of the human KCNJ1 potassium channel locus. Gene 1997; 188:9-16. [PMID: 9099852 DOI: 10.1016/s0378-1119(96)00759-7] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
Detailed analyses of transcripts encoding various isoforms of the human potassium (K+, inward rectifying) channel ROM-K (also referred to as K(ir)1.1) revealed the existence of at least five distinct transcripts [Shuck et al., J. Biol. Chem. 269 (1994) 24261-24270]. These five hROM-K transcripts appear to be the result of alternative splicing of five exons. The nucleotide sequence of the genomic DNA including and spanning these exons (the KCNJ1 locus) was obtained directly from lambda and P1 clones (a total of 40 kb). The organization of the hKCNJ1 gene was determined by combining this sequence information with data obtained from primer extension and RT-PCR experiments. It appears that the hKCNJ1 gene utilizes multiple promoters, with promoter-like elements found 5' of exons 1, 4, or 5. The promoter 5' of exon 5 was unexpected; thus, it appears that the hKCNJ1 gene is capable of producing six distinct hROM-K transcripts via the use of three promoters and alternative splicing of five exons. Comparisons of the rat and human ROM-K cDNA sequences find human homologs (orthologs) for two of the three distinct rROM-K transcripts. A search of the complete human KCNJ1 sequence with the exon sequence that defines the other rROM-K transcript located a region of shared nucleotides, a putative sixth exon, in the hKCNJ1 gene. This finding suggests that the rKCNJ1 gene may contain an exon that is no longer or infrequently used in transcripts derived from the hKCNJ1 gene.
Collapse
Affiliation(s)
- J H Bock
- Molecular Biology Unit, Pharmacia and Upjohn Company, Kalamazoo, MI 49007, USA
| | | | | | | | | | | |
Collapse
|
44
|
Flint J, Thomas K, Micklem G, Raynham H, Clark K, Doggett NA, King A, Higgs DR. The relationship between chromosome structure and function at a human telomeric region. Nat Genet 1997; 15:252-7. [PMID: 9054936 DOI: 10.1038/ng0397-252] [Citation(s) in RCA: 117] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Abstract
We have sequenced a contiguous 284,495-bp segment of DNA extending from the terminal (TTAGGG)n repeats of the short arm of chromosome 16, providing a full description of the transition from telomeric through subtelomeric DNA to sequences that are unique to the chromosome. To complement and extend analysis of the primary sequence, we have characterized mRNA transcripts, patterns of DNA methylation and DNase I sensitivity. Together with previous data these studies describe in detail the structural and functional organization of a human telomeric region.
Collapse
Affiliation(s)
- J Flint
- MRC Molecular Haematology Unit, John Radcliffe Hospital, Headington, Oxford, UK
| | | | | | | | | | | | | | | |
Collapse
|
45
|
Touchman JW, Bouffard GG, Weintraub LA, Idol JR, Wang L, Robbins CM, Nussbaum JC, Lovett M, Green ED. 2006 expressed-sequence tags derived from human chromosome 7-enriched cDNA libraries. Genome Res 1997; 7:281-92. [PMID: 9074931 DOI: 10.1101/gr.7.3.281] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
The establishment and mapping of gene-specific DNA sequences greatly complement the ongoing efforts to map and sequence all human chromosomes. To facilitate our studies of human chromosome 7, we have generated and analyzed 2006 expressed-sequence tags (ESTs) derived from a collection of direct selection cDNA libraries that are highly enriched for human chromosome 7 gene sequences. Similarity searches indicate that approximately two-thirds of the ESTs are not represented by sequences in the public databases, including those in dbEST. In addition, a large fraction (68%) of the ESTs do not have redundant or overlapping sequences within our collection. Human DNA-specific sequence-tagged sites (STSs) have been developed from 190 of the ESTs. Remarkably, 180 (96%) of these STSs map to chromosome 7, demonstrating the robustness of chromosome enrichment in constructing the direct selection cDNA libraries. Thus far, 140 of these EST-specific STSs have been assigned unequivocally to YAC contigs that are distributed across the chromosome. Together, these studies provide > 2000 ESTs highly enriched for chromosome 7 gene sequences, 180 new chromosome 7 STSs corresponding to ESTs, and a definitive demonstration of the ability to enrich for chromosome-specific cDNAs by direct selection. Furthermore, the libraries, sequence data, and mapping information will contribute to the construction of a chromosome 7 transcript map.
Collapse
Affiliation(s)
- J W Touchman
- Genome Technology Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
46
|
McNaughton JC, Hughes G, Jones WA, Stockwell PA, Klamut HJ, Petersen GB. The evolution of an intron: analysis of a long, deletion-prone intron in the human dystrophin gene. Genomics 1997; 40:294-304. [PMID: 9119397 DOI: 10.1006/geno.1996.4543] [Citation(s) in RCA: 46] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
The sequence of a 112-kb region of the human dystrophin (DMD/BMD) gene encompassing the deletion prone intron 7 (110 kb) and the much shorter intron 8 (1.1 kb) has been determined. Recognizable insertion sequences account for approximately 40% of intron 7. LINE-1 and THE-1/LTR sequences occur in intron 7 with significantly higher frequency than would be expected statistically while Alu sequences are underrepresented. Intron 7 also contains numerous mammalian-wide interspersed repeats, a diverse range of medium reiteration repeats of unknown origin, and a sequence derived from a mariner transposon. By contrast, the shorter intron 8 contains no detectable insertion sequences. Dating of the LI and Alu sequences suggests that intron 7 has approximately doubled in size within the past 130 million years, and comparison with the corresponding intron from the pufferfish (Fugu rubripes) suggests that the intron has expanded some 44-fold over a period of 400 million years. The possible contribution of the insertion elements to the instability of intron 7 is discussed.
Collapse
Affiliation(s)
- J C McNaughton
- Department of Biochemistry, University of Otago, Dunedin, New Zealand
| | | | | | | | | | | |
Collapse
|
47
|
Mitchison HM, Munroe PB, O'Rawe AM, Taschner PE, de Vos N, Kremmidiotis G, Lensink I, Munk AC, D'Arigo KL, Anderson JW, Lerner TJ, Moyzis RK, Callen DF, Breuning MH, Doggett NA, Gardiner RM, Mole SE. Genomic structure and complete nucleotide sequence of the Batten disease gene, CLN3. Genomics 1997; 40:346-50. [PMID: 9119403 DOI: 10.1006/geno.1996.4576] [Citation(s) in RCA: 34] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
We recently cloned a cDNA for CLN3, the gene for juvenile-onset neuronal ceroid lipofuscinosis or Batten disease. To resolve the genomic organization we used a cosmid clone containing CLN3 to sequence the entire gene in addition to 1.1 kb 5' of the start of the published CLN3 cDNA and 0.3 kb 3' to the polyadenylation site. CLN3 is organized into at least 15 exons spanning 15 kb and ranging from 47 to 356 bp. The 14 introns vary from 80 to 4227 bp, and all exon/intron junction sequences conform to the GT/AG rule. Numerous repetitive Alu elements are present within the introns and 5'- and 3'-untranslated regions. The 5' region of the CLN3 gene contains several potential transcription regulatory elements but no consensus TATA-1 box was identified. CLN3 is homologous to 27 deposited human ESTs, and sequence comparisons suggest alternative splicing of the gene and the existence of transcribed sequences upstream to the start of the published CLN3 cDNA.
Collapse
Affiliation(s)
- H M Mitchison
- Department of Paediatrics, University College London Medical School, Rayne Institute, United Kingdom
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
48
|
Viswanathan GM, Buldyrev SV, Havlin S, Stanley HE. Quantification of DNA patchiness using long-range correlation measures. Biophys J 1997; 72:866-75. [PMID: 9017212 PMCID: PMC1185610 DOI: 10.1016/s0006-3495(97)78721-6] [Citation(s) in RCA: 39] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023] Open
Abstract
We introduce and develop new techniques to quantify DNA patchiness, and to quantify characteristics of its mosaic structure. These techniques, which involve calculating two functions, alpha(l) and beta(l), measure correlations at length scale l and detect distinct characteristic patch sizes embedded in scale-invariant patch size distributions. Using these new methods, we address a number of issues relating to the mosaic structure of genomic DNA. We find several distinct characteristic patch sizes in certain genomic sequences, and compare, contrast, and quantify the correlation properties of different sequences, including a number of yeast, human, and prokaryotic sequences. We exclude the possibility that the correlation properties and the known mosaic structure of DNA can be explained either by simple Markov processes or by tandem repeats of dinucleotides. We find that the distinct patch sizes in all 16 yeast chromosomes are similar. Furthermore, we test the hypothesis that, for yeast, patchiness is caused by the alternation of coding and noncoding regions, and the hypothesis that in human sequences patchiness is related to repetitive sequences. We find that, by themselves, neither the alternation of coding and noncoding regions, nor repetitive sequences, can fully explain the long-range correlation properties of DNA.
Collapse
Affiliation(s)
- G M Viswanathan
- Center for Polymer Studies, Boston University, Massachusetts 02215, USA.
| | | | | | | |
Collapse
|
49
|
Stoesser G, Sterk P, Tuli MA, Stoehr PJ, Cameron GN. The EMBL Nucleotide Sequence Database. Nucleic Acids Res 1997; 25:7-14. [PMID: 9016493 PMCID: PMC146376 DOI: 10.1093/nar/25.1.7] [Citation(s) in RCA: 53] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023] Open
Abstract
The EMBL Nucleotide Sequence Database is a comprehensive database of DNA and RNA sequences directly submitted from researchers and genome sequencing groups and collected from the scientific literature and patent applications. In collaboration with DDBJ and GenBank the database is produced, maintained and distributed at the European Bioinformatics Institute (EBI) and constitutes Europe's primary nucleotide sequence resource. Database releases are produced quarterly and are distributed on CD-ROM. EBI's network services allow access to the most up-to-date data collection via Internet and World Wide Web interface, providing database searching and sequence similarity facilities plus access to a large number of additional databases.
Collapse
Affiliation(s)
- G Stoesser
- EMBL Outstation, the EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.
| | | | | | | | | |
Collapse
|
50
|
Heximer SP, Cristillo AD, Russell L, Forsdyke DR. Sequence analysis and expression in cultured lymphocytes of the human FOSB gene (G0S3). DNA Cell Biol 1996; 15:1025-38. [PMID: 8985116 DOI: 10.1089/dna.1996.15.1025] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023] Open
Abstract
G0S3 is a member of a set of putative G0/G1 switch regulatory genes (G0S genes) selected by screening cDNA libraries prepared from human blood mononuclear cells cultured for 2 hr with lectin and cycloheximide. The sequence shows high homology with the murine FOSB gene, which encodes a component of the AP1 transcriptional regulator. Comparison of cDNA and genomic sequences reveals a 4-exon structure characteristic of the FOS family of genes. Freshly isolated cells show high levels of FOSB/G0S3 and FOS/G0S7 mRNAs, which decline rapidly during incubation in culture medium. The kinetics of expression suggest that the high initial levels are caused by the isolation procedure, and do not reflect constitutive expression. In cells preincubated for a day, levels of FOS mRNA reach a maximum 20 min after the addition of lectin and decline to control levels over the next 3 hr. Levels of FOSB mRNA reach a maximum 40 min after the addition of lectin and decline to control levels over the next 6 hr. In freshly isolated cells, both FOS and FOSB mRNAs increase dramatically in response to the protein synthesis inhibitor cycloheximide. In preincubated cells, the cycloheximide response is decreased, especially in the case of FOSB. These differences in expression of FOS and FOSB suggest different roles and regulation. Regions of low base order-dependent stem-loop potential in the region of the gene are defined. These indicate where base order has been adapted for purposes other than stem-loop stability (e.g., encoding proteins or gene regulation). Regions of low potential in a 68.5-kb genomic segment containing the FOSB gene suggest that the potential may help locate genes in uncharted DNA sequences.
Collapse
|