Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

Total Articles

15
(from Reference Citation Analysis)

Article PDFs (8)

Cited by > 0 (15)

Searched Name

James G R Gilbert

Ranked By

Results Analysis

Year Published Analysis
Article Type Analysis
Publication Title Analysis
Category Analysis

Results Analysis

Indexed Articles

Year Published

Show more Refine

Article Statistics

Refine

Publication Titles

Show more Refine

Grant Agencies

Show more Refine

Category

Show more Refine

Number	Citation Analysis
1	Current status and new features of the Consensus Coding Sequence database. Nucleic Acids Res 2013;42:D865-72. [PMID: 24217909 PMCID: PMC3965069 DOI: 10.1093/nar/gkt1059] [Citation(s) in RCA: 112] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open Abstract The Consensus Coding Sequence (CCDS) project (http://www.ncbi.nlm.nih.gov/CCDS/) is a collaborative effort to maintain a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assemblies by the National Center for Biotechnology Information (NCBI) and Ensembl genome annotation pipelines. Identical annotations that pass quality assurance tests are tracked with a stable identifier (CCDS ID). Members of the collaboration, who are from NCBI, the Wellcome Trust Sanger Institute and the University of California Santa Cruz, provide coordinated and continuous review of the dataset to ensure high-quality CCDS representations. We describe here the current status and recent growth in the CCDS dataset, as well as recent changes to the CCDS web and FTP sites. These changes include more explicit reporting about the NCBI and Ensembl annotation releases being compared, new search and display options, the addition of biologically descriptive information and our approach to representing genes for which support evidence is incomplete. We also present a summary of recent and future curation targets. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
2	The non-obese diabetic mouse sequence, annotation and variation resource: an aid for investigating type 1 diabetes. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2013;2013:bat032. [PMID: 23729657 PMCID: PMC3668384 DOI: 10.1093/database/bat032] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Abstract Model organisms are becoming increasingly important for the study of complex diseases such as type 1 diabetes (T1D). The non-obese diabetic (NOD) mouse is an experimental model for T1D having been bred to develop the disease spontaneously in a process that is similar to humans. Genetic analysis of the NOD mouse has identified around 50 disease loci, which have the nomenclature Idd for insulin-dependent diabetes, distributed across at least 11 different chromosomes. In total, 21 Idd regions across 6 chromosomes, that are major contributors to T1D susceptibility or resistance, were selected for finished sequencing and annotation at the Wellcome Trust Sanger Institute. Here we describe the generation of 40.4 mega base-pairs of finished sequence from 289 bacterial artificial chromosomes for the NOD mouse. Manual annotation has identified 738 genes in the diabetes sensitive NOD mouse and 765 genes in homologous regions of the diabetes resistant C57BL/6J reference mouse across 19 candidate Idd regions. This has allowed us to call variation consequences between homologous exonic sequences for all annotated regions in the two mouse strains. We demonstrate the importance of this resource further by illustrating the technical difficulties that regions of inter-strain structural variation between the NOD mouse and the C57BL/6J reference mouse can cause for current next generation sequencing and assembly techniques. Furthermore, we have established that the variation rate in the Idd regions is 2.3 times higher than the mean found for the whole genome assembly for the NOD/ShiLtJ genome, which we suggest reflects the fact that positive selection for functional variation in immune genes is beneficial in regard to host defence. In summary, we provide an important resource, which aids the analysis of potential causative genes involved in T1D susceptibility. Database URLs:http://www.sanger.ac.uk/resources/mouse/nod/; http://vega-previous.sanger.ac.uk/info/data/mouse_regions.html Collapse Key Words Collapse MESH Headings Animals Base Pairing/genetics Base Sequence Diabetes Mellitus, Type 1/genetics Genetic Loci/genetics Genetic Variation Genome/genetics Humans Mice Mice, Inbred C57BL Mice, Inbred NOD Molecular Sequence Annotation Polymorphism, Single Nucleotide/genetics Sequence Alignment Sequence Analysis, DNA Collapse Grants 091157 Wellcome Trust 100140 Wellcome Trust 096388 Wellcome Trust AI 15416 NIAID NIH HHS Collapse
3	Sequencing and comparative analysis of the gorilla MHC genomic sequence. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2013;2013:bat011. [PMID: 23589541 PMCID: PMC3626023 DOI: 10.1093/database/bat011] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Abstract Major histocompatibility complex (MHC) genes play a critical role in vertebrate immune response and because the MHC is linked to a significant number of auto-immune and other diseases it is of great medical interest. Here we describe the clone-based sequencing and subsequent annotation of the MHC region of the gorilla genome. Because the MHC is subject to extensive variation, both structural and sequence-wise, it is not readily amenable to study in whole genome shotgun sequence such as the recently published gorilla genome. The variation of the MHC also makes it of evolutionary interest and therefore we analyse the sequence in the context of human and chimpanzee. In our comparisons with human and re-annotated chimpanzee MHC sequence we find that gorilla has a trimodular RCCX cluster, versus the reference human bimodular cluster, and additional copies of Class I (pseudo)genes between Gogo-K and Gogo-A (the orthologues of HLA-K and -A). We also find that Gogo-H (and Patr-H) is coding versus the HLA-H pseudogene and, conversely, there is a Gogo-DQB2 pseudogene versus the HLA-DQB2 coding gene. Our analysis, which is freely available through the VEGA genome browser, provides the research community with a comprehensive dataset for comparative and evolutionary research of the MHC. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
4	Analyses of pig genomes provide insight into porcine demography and evolution. Nature 2012;491:393-8. [PMID: 23151582 PMCID: PMC3566564 DOI: 10.1038/nature11622] [Citation(s) in RCA: 947] [Impact Index Per Article: 78.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2012] [Accepted: 09/27/2012] [Indexed: 01/03/2023] Abstract For 10,000 years pigs and humans have shared a close and complex relationship. From domestication to modern breeding practices, humans have shaped the genomes of domestic pigs. Here we present the assembly and analysis of the genome sequence of a female domestic Duroc pig (Sus scrofa) and a comparison with the genomes of wild and domestic pigs from Europe and Asia. Wild pigs emerged in South East Asia and subsequently spread across Eurasia. Our results reveal a deep phylogenetic split between European and Asian wild boars ∼1 million years ago, and a selective sweep analysis indicates selection on genes involved in RNA processing and regulation. Genes associated with immune response and olfaction exhibit fast evolution. Pigs have the largest repertoire of functional olfactory receptor genes, reflecting the importance of smell in this scavenging animal. The pig genome sequence provides an important resource for further improvements of this important livestock species, and our identification of many putative disease-causing variants extends the potential of the pig as a biomedical model. Collapse Key Words sequencing genome evolution Collapse MESH Headings Animals Demography Genome/genetics Models, Animal Molecular Sequence Data Phylogeny Population Dynamics Sus scrofa/classification Sus scrofa/genetics Collapse Grants T32 AI083196 NIAID NIH HHS 5 P41 LM006252 NLM NIH HHS R13 RR032267 NCRR NIH HHS R21 DA027548 NIDA NIH HHS BBS/E/D/20211550 Biotechnology and Biological Sciences Research Council BB/E010520/1 Biotechnology and Biological Sciences Research Council 095908 Wellcome Trust 249894 European Research Council R13 RR020283A NCRR NIH HHS BB/E010768/1 Biotechnology and Biological Sciences Research Council P41 LM006252 NLM NIH HHS BB/E010520/2 Biotechnology and Biological Sciences Research Council G0900950 Medical Research Council P20 RR017686 NCRR NIH HHS BB/H005935/1 Biotechnology and Biological Sciences Research Council BB/I025328/1 Biotechnology and Biological Sciences Research Council R21 HG006464 NHGRI NIH HHS P30 DA018310 NIDA NIH HHS R13 RR032267A NCRR NIH HHS R13 RR020283 NCRR NIH HHS 5 P41LM006252 NLM NIH HHS ETM/32 Chief Scientist Office BB/G004013/1 Biotechnology and Biological Sciences Research Council BBS/E/D/05191130 Biotechnology and Biological Sciences Research Council Wellcome Trust P20-RR017686 NCRR NIH HHS BB/E011640/1 Biotechnology and Biological Sciences Research Council Collapse
5	Community gene annotation in practice. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2012;2012:bas009. [PMID: 22434843 PMCID: PMC3308165 DOI: 10.1093/database/bas009] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Abstract Manual annotation of genomic data is extremely valuable to produce an accurate reference gene set but is expensive compared with automatic methods and so has been limited to model organisms. Annotation tools that have been developed at the Wellcome Trust Sanger Institute (WTSI, http://www.sanger.ac.uk/.) are being used to fill that gap, as they can be used remotely and so open up viable community annotation collaborations. We introduce the ‘Blessed’ annotator and ‘Gatekeeper’ approach to Community Annotation using the Otterlace/ZMap genome annotation tool. We also describe the strategies adopted for annotation consistency, quality control and viewing of the annotation. Database URL: http://vega.sanger.ac.uk/index.html Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
6	The genome sequence of taurine cattle: a window to ruminant biology and evolution. Science 2009;324:522-8. [PMID: 19390049 DOI: 10.1126/science.1169588] [Citation(s) in RCA: 806] [Impact Index Per Article: 53.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Abstract To understand the biology and evolution of ruminants, the cattle genome was sequenced to about sevenfold coverage. The cattle genome contains a minimum of 22,000 genes, with a core set of 14,345 orthologs shared among seven mammalian species of which 1217 are absent or undetected in noneutherian (marsupial or monotreme) genomes. Cattle-specific evolutionary breakpoint regions in chromosomes have a higher density of segmental duplications, enrichment of repetitive elements, and species-specific variations in genes associated with lactation and immune responsiveness. Genes involved in metabolism are generally highly conserved, although five metabolic genes are deleted or extensively diverged from their human orthologs. The cattle genome sequence thus provides a resource for understanding mammalian evolution and accelerating livestock genetic improvement for milk and meat production. Collapse Key Words Collapse MESH Headings Alternative Splicing Animals Animals, Domestic Biological Evolution Cattle Evolution, Molecular Female Genetic Variation Genome Humans Male MicroRNAs/genetics Molecular Sequence Data Proteins/genetics Sequence Analysis, DNA Species Specificity Synteny Collapse Grants U54 HG003273-04S1 NHGRI NIH HHS U54 HG003273-05 NHGRI NIH HHS U54 HG003273 NHGRI NIH HHS U54 HG003273-05S2 NHGRI NIH HHS U54 HG003273-08 NHGRI NIH HHS U54 HG003273-06S1 NHGRI NIH HHS 077198 Wellcome Trust BBS/B/13438 Biotechnology and Biological Sciences Research Council U54 HG003273-07 NHGRI NIH HHS 062023 Wellcome Trust U54 HG003273-06 NHGRI NIH HHS U54 HG003273-06S2 NHGRI NIH HHS P30 DA018310 NIDA NIH HHS BBS/B/13446 Biotechnology and Biological Sciences Research Council U54 HG003273-05S1 NHGRI NIH HHS BB/D524040/2 Biotechnology and Biological Sciences Research Council U54 HG003273-04 NHGRI NIH HHS Collapse
7	The vertebrate genome annotation (Vega) database. Nucleic Acids Res 2007;36:D753-60. [PMID: 18003653 PMCID: PMC2238886 DOI: 10.1093/nar/gkm987] [Citation(s) in RCA: 183] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open Abstract The Vertebrate Genome Annotation (Vega) database (http://vega.sanger.ac.uk) was first made public in 2004 and has been designed to view manual annotation of human, mouse and zebrafish genomic sequences produced at the Wellcome Trust Sanger Institute. Since its initial release, the number of human annotated loci has more than doubled to close to 33 000 and now contains comprehensive annotation on 20 of the 24 human chromosomes, four whole mouse chromosomes and around 40% of the zebrafish Danio rerio genome. In addition, we offer manual annotation of a number of haplotype regions in mouse and human and regions of comparative interest in pig and dog that are unique to Vega. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
8	GENCODE: producing a reference annotation for ENCODE. Genome Biol 2006;7 Suppl 1:S4.1-9. [PMID: 16925838 PMCID: PMC1810553 DOI: 10.1186/gb-2006-7-s1-s4] [Citation(s) in RCA: 440] [Impact Index Per Article: 24.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open Abstract BACKGROUND The GENCODE consortium was formed to identify and map all protein-coding genes within the ENCODE regions. This was achieved by a combination of initial manual annotation by the HAVANA team, experimental validation by the GENCODE consortium and a refinement of the annotation based on these experimental results. RESULTS The GENCODE gene features are divided into eight different categories of which only the first two (known and novel coding sequence) are confidently predicted to be protein-coding genes. 5' rapid amplification of cDNA ends (RACE) and RT-PCR were used to experimentally verify the initial annotation. Of the 420 coding loci tested, 229 RACE products have been sequenced. They supported 5' extensions of 30 loci and new splice variants in 50 loci. In addition, 46 loci without evidence for a coding sequence were validated, consisting of 31 novel and 15 putative transcripts. We assessed the comprehensiveness of the GENCODE annotation by attempting to validate all the predicted exon boundaries outside the GENCODE annotation. Out of 1,215 tested in a subset of the ENCODE regions, 14 novel exon pairs were validated, only two of them in intergenic regions. CONCLUSION In total, 487 loci, of which 434 are coding, have been annotated as part of the GENCODE reference set available from the UCSC browser. Comparison of GENCODE annotation with RefSeq and ENSEMBL show only 40% of GENCODE exons are contained within the two sets, which is a reflection of the high number of alternative splice forms with unique exons annotated. Over 50% of coding loci have been experimentally verified by 5' RACE for EGASP and the GENCODE collaboration is continuing to refine its annotation of 1% human genome with the aid of experimental validation. Collapse Key Words Collapse MESH Headings Chromosome Mapping Computational Biology/methods Computational Biology/standards Expressed Sequence Tags Genes Genome, Human Genomics/methods Genomics/standards Humans Proteins/genetics Pseudogenes RNA, Messenger/analysis Reference Standards Sequence Analysis, DNA Sequence Analysis, RNA Collapse Grants Collapse
9	The Vertebrate Genome Annotation (Vega) database. Nucleic Acids Res 2005;33:D459-65. [PMID: 15608237 PMCID: PMC540089 DOI: 10.1093/nar/gki135] [Citation(s) in RCA: 119] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open Abstract The Vertebrate Genome Annotation (Vega) database (http://vega.sanger.ac.uk) has been designed to be a community resource for browsing manual annotation of finished sequences from a variety of vertebrate genomes. Its core database is based on an Ensembl-style schema, extended to incorporate curation-specific metadata. In collaboration with the genome sequencing centres, Vega attempts to present consistent high-quality annotation of the published human chromosome sequences. In addition, it is also possible to view various finished regions from other vertebrates, including mouse and zebrafish. Vega displays only manually annotated gene structures built using transcriptional evidence, which can be examined in the browser. Attempts have been made to standardize the annotation procedure across each vertebrate genome, which should aid comparative analysis of orthologues across the different finished regions. Collapse Key Words Collapse MESH Headings Animals Chromosomes, Human/chemistry Database Management Systems Databases, Genetic/standards Genome Genomics Humans Mice User-Computer Interface Vertebrates/genetics Zebrafish/genetics Collapse Grants Collapse
10	Organization and evolution of a gene-rich region of the mouse genome: a 12.7-Mb region deleted in the Del(13)Svea36H mouse. Genome Res 2004;14:1888-901. [PMID: 15364904 PMCID: PMC524412 DOI: 10.1101/gr.2478604] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Abstract Del(13)Svea36H (Del36H) is a deletion of approximately 20% of mouse chromosome 13 showing conserved synteny with human chromosome 6p22.1-6p22.3/6p25. The human region is lost in some deletion syndromes and is the site of several disease loci. Heterozygous Del36H mice show numerous phenotypes and may model aspects of human genetic disease. We describe 12.7 Mb of finished, annotated sequence from Del36H. Del36H has a higher gene density than the draft mouse genome, reflecting high local densities of three gene families (vomeronasal receptors, serpins, and prolactins) which are greatly expanded relative to human. Transposable elements are concentrated near these gene families. We therefore suggest that their neighborhoods are gene factories, regions of frequent recombination in which gene duplication is more frequent. The gene families show different proportions of pseudogenes, likely reflecting different strengths of purifying selection and/or gene conversion. They are also associated with relatively low simple sequence concentrations, which vary across the region with a periodicity of approximately 5 Mb. Del36H contains numerous evolutionarily conserved regions (ECRs). Many lie in noncoding regions, are detectable in species as distant as Ciona intestinalis, and therefore are candidate regulatory sequences. This analysis will facilitate functional genomic analysis of Del36H and provides insights into mouse genome evolution. Collapse Key Words Collapse MESH Headings Animals Evolution, Molecular Genome Mice Multigene Family Sequence Deletion Collapse Grants MC_U142684171 Medical Research Council Collapse
11	DNA sequence and analysis of human chromosome 9. Nature 2004;429:369-74. [PMID: 15164053 PMCID: PMC2734081 DOI: 10.1038/nature02465] [Citation(s) in RCA: 90] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2003] [Accepted: 03/08/2004] [Indexed: 11/09/2022] Abstract Chromosome 9 is highly structurally polymorphic. It contains the largest autosomal block of heterochromatin, which is heteromorphic in 6-8% of humans, whereas pericentric inversions occur in more than 1% of the population. The finished euchromatic sequence of chromosome 9 comprises 109,044,351 base pairs and represents >99.6% of the region. Analysis of the sequence reveals many intra- and interchromosomal duplications, including segmental duplications adjacent to both the centromere and the large heterochromatic block. We have annotated 1,149 genes, including genes implicated in male-to-female sex reversal, cancer and neurodegenerative disease, and 426 pseudogenes. The chromosome contains the largest interferon gene cluster in the human genome. There is also a region of exceptionally high gene and G + C content including genes paralogous to those in the major histocompatibility complex. We have also detected recently duplicated genes that exhibit different rates of sequence divergence, presumably reflecting natural selection. Collapse Key Words Collapse MESH Headings Base Composition Chromosomes, Human, Pair 9/genetics Euchromatin/genetics Evolution, Molecular Female Gene Duplication Genes Genes, Duplicate/genetics Genetic Variation/genetics Genetics, Medical Genomics Heterochromatin/genetics Humans Male Neoplasms/genetics Neurodegenerative Diseases/genetics Physical Chromosome Mapping Pseudogenes/genetics Sequence Analysis, DNA Sex Determination Processes Collapse Grants Collapse
12	The DNA sequence and comparative analysis of human chromosome 10. Nature 2004;429:375-81. [PMID: 15164054 DOI: 10.1038/nature02462] [Citation(s) in RCA: 62] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2003] [Accepted: 03/09/2004] [Indexed: 11/08/2022] Abstract The finished sequence of human chromosome 10 comprises a total of 131,666,441 base pairs. It represents 99.4% of the euchromatic DNA and includes one megabase of heterochromatic sequence within the pericentromeric region of the short and long arm of the chromosome. Sequence annotation revealed 1,357 genes, of which 816 are protein coding, and 430 are pseudogenes. We observed widespread occurrence of overlapping coding genes (either strand) and identified 67 antisense transcripts. Our analysis suggests that both inter- and intrachromosomal segmental duplications have impacted on the gene count on chromosome 10. Multispecies comparative analysis indicated that we can readily annotate the protein-coding genes with current resources. We estimate that over 95% of all coding exons were identified in this study. Assessment of single base changes between the human chromosome 10 and chimpanzee sequence revealed nonsense mutations in only 21 coding genes with respect to the human sequence. Collapse Key Words Collapse MESH Headings Animals Base Composition Chromosomes, Human, Pair 10/genetics Contig Mapping CpG Islands/genetics Evolution, Molecular Exons/genetics Gene Duplication Genes Genetic Variation/genetics Genetics, Medical Genomics Humans Pan troglodytes/genetics Physical Chromosome Mapping Proteins/genetics Pseudogenes/genetics Sequence Analysis, DNA Collapse Grants Collapse
13	The DNA sequence and analysis of human chromosome 13. Nature 2004;428:522-8. [PMID: 15057823 PMCID: PMC2665288 DOI: 10.1038/nature02379] [Citation(s) in RCA: 75] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2003] [Accepted: 01/27/2004] [Indexed: 12/14/2022] Abstract Chromosome 13 is the largest acrocentric human chromosome. It carries genes involved in cancer including the breast cancer type 2 (BRCA2) and retinoblastoma (RB1) genes, is frequently rearranged in B-cell chronic lymphocytic leukaemia, and contains the DAOA locus associated with bipolar disorder and schizophrenia. We describe completion and analysis of 95.5 megabases (Mb) of sequence from chromosome 13, which contains 633 genes and 296 pseudogenes. We estimate that more than 95.4% of the protein-coding genes of this chromosome have been identified, on the basis of comparison with other vertebrate genome sequences. Additionally, 105 putative non-coding RNA genes were found. Chromosome 13 has one of the lowest gene densities (6.5 genes per Mb) among human chromosomes, and contains a central region of 38 Mb where the gene density drops to only 3.1 genes per Mb. Collapse Key Words Collapse MESH Headings Chromosome Mapping Chromosomes, Human, Pair 13/genetics Genes/genetics Genetics, Medical Humans Physical Chromosome Mapping Pseudogenes/genetics RNA, Untranslated/genetics Sequence Analysis, DNA Collapse Grants Collapse
14	The DNA sequence and analysis of human chromosome 6. Nature 2003;425:805-11. [PMID: 14574404 DOI: 10.1038/nature02055] [Citation(s) in RCA: 235] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2003] [Accepted: 09/11/2003] [Indexed: 01/17/2023] Abstract Chromosome 6 is a metacentric chromosome that constitutes about 6% of the human genome. The finished sequence comprises 166,880,988 base pairs, representing the largest chromosome sequenced so far. The entire sequence has been subjected to high-quality manual annotation, resulting in the evidence-supported identification of 1,557 genes and 633 pseudogenes. Here we report that at least 96% of the protein-coding genes have been identified, as assessed by multi-species comparative sequence analysis, and provide evidence for the presence of further, otherwise unsupported exons/genes. Among these are genes directly implicated in cancer, schizophrenia, autoimmunity and many other diseases. Chromosome 6 harbours the largest transfer RNA gene cluster in the genome; we show that this cluster co-localizes with a region of high transcriptional activity. Within the essential immune loci of the major histocompatibility complex, we find HLA-B to be the most polymorphic gene on chromosome 6 and in the human genome. Collapse Key Words Collapse MESH Headings Animals Chromosomes, Human, Pair 6/genetics Exons/genetics Genes/genetics Genetic Diseases, Inborn/genetics HLA-B Antigens/genetics Humans Physical Chromosome Mapping Pseudogenes/genetics RNA, Transfer/genetics Sequence Analysis, DNA Collapse Grants Collapse
15	Transcriptional regulation of the stem cell leukemia gene (SCL)--comparative analysis of five vertebrate SCL loci. Genome Res 2002;12:749-59. [PMID: 11997341 PMCID: PMC186570 DOI: 10.1101/gr.45502] [Citation(s) in RCA: 78] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2001] [Accepted: 03/19/2002] [Indexed: 12/25/2022] Abstract The stem cell leukemia (SCL) gene encodes a bHLH transcription factor with a pivotal role in hematopoiesis and vasculogenesis and a pattern of expression that is highly conserved between mammals and zebrafish. Here we report the isolation and characterization of the zebrafish SCL locus together with the identification of three neighboring genes, IER5, MAP17, and MUPP1. This region spans 68 kb and comprises the longest zebrafish genomic sequence currently available for comparison with mammalian, chicken, and pufferfish sequences. Our data show conserved synteny between zebrafish and mammalian SCL and MAP17 loci, thus suggesting the likely genomic domain necessary for the conserved pattern of SCL expression. Long-range comparative sequence analysis/phylogenetic footprinting was used to identify noncoding conserved sequences representing candidate transcriptional regulatory elements. The SCL promoter/enhancer, exon 1, and the poly(A) region were highly conserved, but no homology to other known mouse SCL enhancers was detected in the zebrafish sequence. A combined homology/structure analysis of the poly(A) region predicted consistent structural features, suggesting a conserved functional role in mRNA regulation. Analysis of the SCL promoter/enhancer revealed five motifs, which were conserved from zebrafish to mammals, and each of which is essential for the appropriate pattern or level of SCL transcription. Collapse Key Words Collapse MESH Headings 5' Untranslated Regions/genetics Amino Acid Sequence Animals Basic Helix-Loop-Helix Transcription Factors Cell Line Chickens Chromosomes, Artificial, P1 Bacteriophage/genetics Cloning, Molecular Conserved Sequence DNA-Binding Proteins/biosynthesis DNA-Binding Proteins/genetics DNA-Binding Proteins/metabolism Exons/genetics Gene Expression Regulation, Neoplastic/genetics Genetic Markers/genetics Genetic Markers/physiology Humans Leukemia-Lymphoma, Adult T-Cell/genetics Mice Mice, Transgenic Molecular Sequence Data Poly A/metabolism Promoter Regions, Genetic/genetics Proto-Oncogene Proteins Rats Sequence Homology, Nucleic Acid T-Cell Acute Lymphocytic Leukemia Protein 1 Tetraodontiformes Transcription Factors/biosynthesis Transcription Factors/chemistry Transcription Factors/genetics Zebrafish/genetics Zebrafish Proteins Collapse Grants Wellcome Trust Collapse