151
|
Florea L, Di Francesco V, Miller J, Turner R, Yao A, Harris M, Walenz B, Mobarry C, Merkulov GV, Charlab R, Dew I, Deng Z, Istrail S, Li P, Sutton G. Gene and alternative splicing annotation with AIR. Genome Res 2005; 15:54-66. [PMID: 15632090 PMCID: PMC540277 DOI: 10.1101/gr.2889405] [Citation(s) in RCA: 56] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2004] [Accepted: 10/14/2004] [Indexed: 11/24/2022]
Abstract
Designing effective and accurate tools for identifying the functional and structural elements in a genome remains at the frontier of genome annotation owing to incompleteness and inaccuracy of the data, limitations in the computational models, and shifting paradigms in genomics, such as alternative splicing. We present a methodology for the automated annotation of genes and their alternatively spliced mRNA transcripts based on existing cDNA and protein sequence evidence from the same species or projected from a related species using syntenic mapping information. At the core of the method is the splice graph, a compact representation of a gene, its exons, introns, and alternatively spliced isoforms. The putative transcripts are enumerated from the graph and assigned confidence scores based on the strength of sequence evidence, and a subset of the high-scoring candidates are selected and promoted into the annotation. The method is highly selective, eliminating the unlikely candidates while retaining 98% of the high-quality mRNA evidence in well-formed transcripts, and produces annotation that is measurably more accurate than some evidence-based gene sets. The process is fast, accurate, and fully automated, and combines the traditionally distinct gene annotation and alternative splicing detection processes in a comprehensive and systematic way, thus considerably aiding in the ensuing manual curation efforts.
Collapse
Affiliation(s)
- Liliana Florea
- Informatics Research, Applied Biosystems, Rockville, Maryland 20850, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
152
|
Olson LE, Roper RJ, Baxter LL, Carlson EJ, Epstein CJ, Reeves RH. Down syndrome mouse models Ts65Dn, Ts1Cje, and Ms1Cje/Ts65Dn exhibit variable severity of cerebellar phenotypes. Dev Dyn 2004; 230:581-9. [PMID: 15188443 DOI: 10.1002/dvdy.20079] [Citation(s) in RCA: 118] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022] Open
Abstract
Two mouse models are widely used for Down syndrome (DS) research. The Ts65Dn mouse carries a small chromosome derived primarily from mouse chromosome 16, causing dosage imbalance for approximately half of human chromosome 21 orthologs. These mice have cerebellar pathology with direct parallels to DS. The Ts1Cje mouse, containing a translocated chromosome 16, is at dosage imbalance for 67% of the genes triplicated in Ts65Dn. We quantified cerebellar volume and granule cell and Purkinje cell density in Ts1Cje. Cerebellar volume was significantly affected to the same degree in Ts1Cje and Ts65Dn, despite that Ts1Cje has fewer triplicated genes. However, dosage imbalance in Ts1Cje had little effect on granule cell and Purkinje cell density. Several mice with dosage imbalance for the segment of the Ts65Dn chromosome not triplicated in Ts1Cje had phenotypes that contrasted with those in Ts1Cje. These observations do not readily differentiate between two prevalent hypotheses for gene action in DS.
Collapse
Affiliation(s)
- L E Olson
- Department of Physiology, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| | | | | | | | | | | |
Collapse
|
153
|
Antonarakis SE, Lyle R, Dermitzakis ET, Reymond A, Deutsch S. Chromosome 21 and down syndrome: from genomics to pathophysiology. Nat Rev Genet 2004; 5:725-38. [PMID: 15510164 DOI: 10.1038/nrg1448] [Citation(s) in RCA: 457] [Impact Index Per Article: 21.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
The sequence of chromosome 21 was a turning point for the understanding of Down syndrome. Comparative genomics is beginning to identify the functional components of the chromosome and that in turn will set the stage for the functional characterization of the sequences. Animal models combined with genome-wide analytical methods have proved indispensable for unravelling the mysteries of gene dosage imbalance.
Collapse
Affiliation(s)
- Stylianos E Antonarakis
- Department of Genetic Medicine and Development, University of Geneva Medical School and University Hospitals of Geneva, 1 rue Michel-Servet, 1211 Geneva, Switzerland.
| | | | | | | | | |
Collapse
|
154
|
Gianfrancesco F, Esposito T, Casu G, Maninchedda G, Roberto R, Pirastu M. Emergence of Talanin protein associated with human uric acid nephrolithiasis in the Hominidae lineage. Gene 2004; 339:131-8. [PMID: 15363853 DOI: 10.1016/j.gene.2004.06.030] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2004] [Revised: 06/06/2004] [Accepted: 06/17/2004] [Indexed: 02/04/2023]
Abstract
Recently, we identified a susceptibility locus for human uric acid nephrolithiasis (UAN) on 10q21-q22 and demonstrated that a novel gene (ZNF365) included in this region produces through alternative splicing several transcripts coding for four protein isoforms. Mutation analysis showed that one of them (Talanin) is associated with UAN. We examined the evolutionary conservation of ZNF365 gene through a comparative genomic approach. Searching for mouse homologs of ZNF365 transcripts, we identified a highly conserved mouse ortholog of ZNF365A transcript, expressed specifically in brain. We did not found a mouse homolog for ZNF365D transcript encoding the Talanin protein, even if we were able to identify the corresponding genomic region in mouse and rat not yet organized in canonical gene structure suggesting that ZNF365D was originated after the branching of hominoid from rodent lineage. In mouse and in most mammals, a functional uricase degrades the uric acid to allantoin, but uricase activity was lost during the Miocene epoch in hominoids. Searching for the presence of Talanin in Primates, we found a canonical intron-exon structure with several stop codons preventing protein production in Old World and New World monkeys. In humans, we observe expression and we have evidence that ZNF365D transcript produces a functional protein. It seems therefore that ZNF365D transcript emerged during primate evolution from a noncoding genomic sequence that evolved in a standard gene structure and assumed its role in parallel with the disappearance of uricase, probably against a disadvantageous excessive hyperuricemia.
Collapse
MESH Headings
- Alternative Splicing
- Amino Acid Sequence
- Animals
- Base Sequence
- Chromosomes, Human, Pair 10/genetics
- Chromosomes, Mammalian/genetics
- Cloning, Molecular
- DNA, Complementary/chemistry
- DNA, Complementary/genetics
- DNA, Complementary/isolation & purification
- DNA-Binding Proteins/genetics
- Evolution, Molecular
- Humans
- Kidney Diseases/blood
- Kidney Diseases/genetics
- Kidney Diseases/pathology
- Mice
- Molecular Sequence Data
- Phylogeny
- Primates/genetics
- Protein Isoforms/genetics
- Sequence Alignment
- Sequence Analysis, DNA
- Sequence Homology, Amino Acid
- Sequence Homology, Nucleic Acid
- Synteny
- Transcription Factors/genetics
- Uric Acid/blood
- Zinc Fingers/genetics
Collapse
|
155
|
Olson LE, Richtsmeier JT, Leszl J, Reeves RH. A chromosome 21 critical region does not cause specific Down syndrome phenotypes. Science 2004; 306:687-90. [PMID: 15499018 PMCID: PMC4019810 DOI: 10.1126/science.1098992] [Citation(s) in RCA: 229] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
The "Down syndrome critical region" (DSCR) is a chromosome 21 segment purported to contain genes responsible for many features of Down syndrome (DS), including craniofacial dysmorphology. We used chromosome engineering to create mice that were trisomic or monosomic for only the mouse chromosome segment orthologous to the DSCR and assessed dysmorphologies of the craniofacial skeleton that show direct parallels with DS in mice with a larger segmental trisomy. The DSCR genes were not sufficient and were largely not necessary to produce the facial phenotype. These results refute specific predictions of the prevailing hypothesis of gene action in DS.
Collapse
Affiliation(s)
- L. E. Olson
- Department of Physiology, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - J. T. Richtsmeier
- Department of Anthropology and Program in Genetics, Pennsylvania State University, University Park, PA 16802, USA
| | - J. Leszl
- Department of Anthropology and Program in Genetics, Pennsylvania State University, University Park, PA 16802, USA
| | - R. H. Reeves
- Department of Physiology, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
- To whom correspondence should be addressed.
| |
Collapse
|
156
|
Pletcher MT, McClurg P, Batalov S, Su AI, Barnes SW, Lagler E, Korstanje R, Wang X, Nusskern D, Bogue MA, Mural RJ, Paigen B, Wiltshire T. Use of a dense single nucleotide polymorphism map for in silico mapping in the mouse. PLoS Biol 2004; 2:e393. [PMID: 15534693 PMCID: PMC526179 DOI: 10.1371/journal.pbio.0020393] [Citation(s) in RCA: 175] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2004] [Accepted: 09/15/2004] [Indexed: 01/08/2023] Open
Abstract
Rapid expansion of available data, both phenotypic and genotypic, for multiple strains of mice has enabled the development of new methods to interrogate the mouse genome for functional genetic perturbations. In silico mapping provides an expedient way to associate the natural diversity of phenotypic traits with ancestrally inherited polymorphisms for the purpose of dissecting genetic traits. In mouse, the current single nucleotide polymorphism (SNP) data have lacked the density across the genome and coverage of enough strains to properly achieve this goal. To remedy this, 470,407 allele calls were produced for 10,990 evenly spaced SNP loci across 48 inbred mouse strains. Use of the SNP set with statistical models that considered unique patterns within blocks of three SNPs as an inferred haplotype could successfully map known single gene traits and a cloned quantitative trait gene. Application of this method to high-density lipoprotein and gallstone phenotypes reproduced previously characterized quantitative trait loci (QTL). The inferred haplotype data also facilitates the refinement of QTL regions such that candidate genes can be more easily identified and characterized as shown for adenylate cyclase 7.
Collapse
Affiliation(s)
- Mathew T Pletcher
- 1Genomics Institute of the Novartis Research Foundation, San DiegoCaliforniaUnited States of America
- 2The Scripps Research Institute, San DiegoCaliforniaUnited States of America
| | - Philip McClurg
- 1Genomics Institute of the Novartis Research Foundation, San DiegoCaliforniaUnited States of America
| | - Serge Batalov
- 1Genomics Institute of the Novartis Research Foundation, San DiegoCaliforniaUnited States of America
| | - Andrew I Su
- 1Genomics Institute of the Novartis Research Foundation, San DiegoCaliforniaUnited States of America
| | - S. Whitney Barnes
- 1Genomics Institute of the Novartis Research Foundation, San DiegoCaliforniaUnited States of America
| | - Erica Lagler
- 1Genomics Institute of the Novartis Research Foundation, San DiegoCaliforniaUnited States of America
| | - Ron Korstanje
- 3The Jackson Laboratory, Bar HarborMaineUnited States of America
| | - Xiaosong Wang
- 3The Jackson Laboratory, Bar HarborMaineUnited States of America
| | | | - Molly A Bogue
- 3The Jackson Laboratory, Bar HarborMaineUnited States of America
| | | | - Beverly Paigen
- 3The Jackson Laboratory, Bar HarborMaineUnited States of America
| | - Tim Wiltshire
- 1Genomics Institute of the Novartis Research Foundation, San DiegoCaliforniaUnited States of America
| |
Collapse
|
157
|
Paterson AH, Bowers JE, Chapman BA, Peterson DG, Rong J, Wicker TM. Comparative genome analysis of monocots and dicots, toward characterization of angiosperm diversity. Curr Opin Biotechnol 2004; 15:120-5. [PMID: 15081049 DOI: 10.1016/j.copbio.2004.03.001] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
The importance of angiosperms to sustaining humanity by providing a wide range of 'ecosystem services' warrants increased exploration of their genomic diversity. The nearly completed sequences for two species representing the major angiosperm subclasses, specifically the dicot Arabidopsis thaliana and the monocot Oryza sativa, provide a foundation for comparative analysis across the angiosperms. The angiosperms also exemplify some challenges to be faced as genomics makes new inroads into describing biotic diversity, in particular polyploidy (genome-wide chromatin duplication), and much larger genome sizes than have been studied to date.
Collapse
Affiliation(s)
- Andrew H Paterson
- Plant Genome Mapping Laboratory, University of Georgia, Athens GA 30602, USA.
| | | | | | | | | | | |
Collapse
|
158
|
Choi HK, Mun JH, Kim DJ, Zhu H, Baek JM, Mudge J, Roe B, Ellis N, Doyle J, Kiss GB, Young ND, Cook DR. Estimating genome conservation between crop and model legume species. Proc Natl Acad Sci U S A 2004; 101:15289-94. [PMID: 15489274 PMCID: PMC524433 DOI: 10.1073/pnas.0402251101] [Citation(s) in RCA: 252] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2004] [Accepted: 08/13/2004] [Indexed: 11/18/2022] Open
Abstract
Legumes are simultaneously one of the largest families of crop plants and a cornerstone in the biological nitrogen cycle. We combined molecular and phylogenetic analyses to evaluate genome conservation both within and between the two major clades of crop legumes. Genetic mapping of orthologous genes identifies broad conservation of genome macrostructure, especially within the galegoid legumes, while also highlighting inferred chromosomal rearrangements that may underlie the variation in chromosome number between these species. As a complement to comparative genetic mapping, we compared sequenced regions of the model legume Medicago truncatula with those of the diploid Lotus japonicus and the polyploid Glycine max. High conservation was observed between the genomes of M. truncatula and L. japonicus, whereas lower levels of conservation were evident between M. truncatula and G. max. In all cases, conserved genome microstructure was punctuated by significant structural divergence, including frequent insertion/deletion of individual genes or groups of genes and lineage-specific expansion/contraction of gene families. These results suggest that comparative mapping may have considerable utility for basic and applied research in the legumes, although its predictive value is likely to be tempered by phylogenetic distance and genome duplication.
Collapse
Affiliation(s)
- Hong-Kyu Choi
- Department of Plant Pathology and College of Agricultural and Environmental Sciences Genomics Facility, University of California, One Shields Avenue, Davis, CA 95616, USA
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
159
|
Antonarakis SE, Reymond A, Lyle R, Deutsch S, Dermitzakis ET. Chromosome 21 and Down syndrome: the post-sequence era. COLD SPRING HARBOR SYMPOSIA ON QUANTITATIVE BIOLOGY 2004; 68:425-30. [PMID: 15338645 DOI: 10.1101/sqb.2003.68.425] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Affiliation(s)
- S E Antonarakis
- Division of Medical Genetics, NCCR Frontiers in Genetics, University of Geneva Medical School and University Hospitals, Geneva, Switzerland
| | | | | | | | | |
Collapse
|
160
|
Bernot A, Weissenbach J. Estimation of the Extent of Synteny Between Tetraodon nigroviridis and Homo sapiens Genomes. J Mol Evol 2004; 59:556-69. [PMID: 15638467 DOI: 10.1007/s00239-004-2649-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
Abstract
This paper presents a genomic comparison between 20 sequenced BACs (or fragments of BACs) from Tetraodon nigroviridis and the human genome. A total of 199 fish genes were identified by informatics resources, together with their putative human orthologues. Comparisons of the localizations in both species led to the identification of 32 syntenic regions and a minimum of 131 rearrangements in these regions that occurred during independent evolution of these species. This made it possible to estimate the rate of genomic rearrangements that occurred per million years (and per megabase). This rate is comparable to that obtained by comparison of the Fugu rubripes shotgun sequence data to human data but is significantly higher that those obtained by comparing the human genome to mammalian genomes. Overall, it suggests that genomic evolution by rearrangement is not uniform within the vertebrate group.
Collapse
Affiliation(s)
- Alain Bernot
- Genethon, 1 rue de l'Internationale, BP60, 90002 Evry Cedex, France.
| | | |
Collapse
|
161
|
Zheng XH, Lu F, Wang ZY, Zhong F, Hoover J, Mural R. Using shared genomic synteny and shared protein functions to enhance the identification of orthologous gene pairs. Bioinformatics 2004; 21:703-10. [PMID: 15458983 DOI: 10.1093/bioinformatics/bti045] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION The identification of orthologous gene pairs is generally based on sequence similarity. Gene pairs that are mutually 'best hits' between the genomes being compared are asserted to be orthologs. Although this method identifies most orthologous gene pairs with high confidence, it will miss a fraction of them, especially genes in duplicated gene families. In addition, the approach depends heavily on the completeness and quality of gene annotation. When the gene sequences are not correctly represented the approach is unlikely to find the correct ortholog. To overcome these limitations, we have developed an approach to identify orthologous gene pairs using shared chromosomal synteny and the annotation of protein function. RESULTS Assembled mouse and human genomes were used to identify the regions of conserved synteny between these genomes. 'Syntenic anchors' are conserved non-repetitive locations between mouse and human genomes. Using these anchors, we identified blocks of sequences that contain consistently ordered anchors between the two genomes (syntenic blocks). The synteny information has been used to help us identify orthologous gene pairs between mouse and human genomes. The approach combines the mutual selection of the best tBlastX hits between human and mouse transcripts, and inferring gene orthologous relationships based on sharing syntenic anchors, collocating in the same syntenic blocks and sharing the same annotated protein function. Using this approach, we were able to find 19,357 orthologous gene pairs between human and mouse genomes, a 20% increase in the number of orthologs identified by conventional approaches.
Collapse
Affiliation(s)
- Xiangqun H Zheng
- Assays and Bioinformatics, Celera Genomics Corporation, 45 West Gude Drive, Rockville, MD 20850, USA
| | | | | | | | | | | |
Collapse
|
162
|
Pruett ND, Tkatchenko TV, Jave-Suarez L, Jacobs DF, Potter CS, Tkatchenko AV, Schweizer J, Awgulewitsch A. Krtap16, characterization of a new hair keratin-associated protein (KAP) gene complex on mouse chromosome 16 and evidence for regulation by Hoxc13. J Biol Chem 2004; 279:51524-33. [PMID: 15385554 DOI: 10.1074/jbc.m404331200] [Citation(s) in RCA: 37] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
Intermediate filament (IF) keratins and keratin-associated proteins (KAPs) are principal structural components of hair and encoded by members of multiple gene families. The severe hair growth defects observed upon aberrant expression of certain keratin and KAP genes in both mouse and man suggest that proper hair growth requires their spatio-temporally coordinated activation. An essential prerequisite for studying these cis-regulatory mechanisms is to define corresponding gene families, their genomic organization, and expression patterns. This work characterizes eight recently identified high glycine/tyrosine (HGT)-type KAP genes collectively designated Krtap16-n. These genes are shown to be integrated into a larger KAP gene domain on mouse chromosome 16 (MMU16) that is orthologous to a recently described HGT- and high sulfur (HS)-type KAP gene complex on human chromosome 21q22.11. All Krtap16 genes exhibit strong expression in a narrowly defined pattern restricted to the lower and middle cortical region of the hair shaft in both developing and cycling hair. During hair follicle regression (catagen), expression levels decrease until expression is no longer detectable in follicles at resting stage (telogen). Since isolation of the Krtap16 genes was based on their differential expression in transgenic mice overexpressing the Hoxc13 transcriptional regulator in hair, we examined whether bona fide Hoxc13 binding sites associated with these genes might be functionally relevant by performing electrophoretic mobility shift assays (EMSAs). The data provide evidence for sequence-specific interaction between Hoxc13 and Krtap16 genes, thus supporting the concept of a regulatory relationship between Hoxc13 and these KAP genes.
Collapse
Affiliation(s)
- Nathanael D Pruett
- Department of Medicine, Medical University of South Carolina, Charleston, South Carolina 29425, USA
| | | | | | | | | | | | | | | |
Collapse
|
163
|
Zhao S, Shetty J, Hou L, Delcher A, Zhu B, Osoegawa K, de Jong P, Nierman WC, Strausberg RL, Fraser CM. Human, mouse, and rat genome large-scale rearrangements: stability versus speciation. Genome Res 2004; 14:1851-60. [PMID: 15364903 PMCID: PMC524408 DOI: 10.1101/gr.2663304] [Citation(s) in RCA: 107] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Using paired-end sequences from bacterial artificial chromosomes, we have constructed high-resolution synteny and rearrangement breakpoint maps among human, mouse, and rat genomes. Among the >300 syntenic blocks identified are segments of over 40 Mb without any detected interspecies rearrangements, as well as regions with frequently broken synteny and extensive rearrangements. As closely related species, mouse and rat share the majority of the breakpoints and often have the same types of rearrangements when compared with the human genome. However, the breakpoints not shared between them indicate that mouse rearrangements are more often interchromosomal, whereas intrachromosomal rearrangements are more prominent in rat. Centromeres may have played a significant role in reorganizing a number of chromosomes in all three species. The comparison of the three species indicates that genome rearrangements follow a path that accommodates a delicate balance between maintaining a basic structure underlying all mammalian species and permitting variations that are necessary for speciation.
Collapse
Affiliation(s)
- Shaying Zhao
- Institute for Genomic Research, Rockville, Maryland 20850, USA.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
164
|
Mallon AM, Wilming L, Weekes J, Gilbert JGR, Ashurst J, Peyrefitte S, Matthews L, Cadman M, McKeone R, Sellick CA, Arkell R, Botcherby MRM, Strivens MA, Campbell RD, Gregory S, Denny P, Hancock JM, Rogers J, Brown SDM. Organization and evolution of a gene-rich region of the mouse genome: a 12.7-Mb region deleted in the Del(13)Svea36H mouse. Genome Res 2004; 14:1888-901. [PMID: 15364904 PMCID: PMC524412 DOI: 10.1101/gr.2478604] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Del(13)Svea36H (Del36H) is a deletion of approximately 20% of mouse chromosome 13 showing conserved synteny with human chromosome 6p22.1-6p22.3/6p25. The human region is lost in some deletion syndromes and is the site of several disease loci. Heterozygous Del36H mice show numerous phenotypes and may model aspects of human genetic disease. We describe 12.7 Mb of finished, annotated sequence from Del36H. Del36H has a higher gene density than the draft mouse genome, reflecting high local densities of three gene families (vomeronasal receptors, serpins, and prolactins) which are greatly expanded relative to human. Transposable elements are concentrated near these gene families. We therefore suggest that their neighborhoods are gene factories, regions of frequent recombination in which gene duplication is more frequent. The gene families show different proportions of pseudogenes, likely reflecting different strengths of purifying selection and/or gene conversion. They are also associated with relatively low simple sequence concentrations, which vary across the region with a periodicity of approximately 5 Mb. Del36H contains numerous evolutionarily conserved regions (ECRs). Many lie in noncoding regions, are detectable in species as distant as Ciona intestinalis, and therefore are candidate regulatory sequences. This analysis will facilitate functional genomic analysis of Del36H and provides insights into mouse genome evolution.
Collapse
Affiliation(s)
- Ann-Marie Mallon
- Medical Research Council Mammalian Genetics Unit, Harwell, Oxfordshire, United Kingdom
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
165
|
Istrail S, Florea L, Halldórsson BV, Kohlbacher O, Schwartz RS, Yap VB, Yewdell JW, Hoffman SL. Comparative immunopeptidomics of humans and their pathogens. Proc Natl Acad Sci U S A 2004; 101:13268-72. [PMID: 15326311 PMCID: PMC516558 DOI: 10.1073/pnas.0404740101] [Citation(s) in RCA: 40] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Major histocompatibility complex class I molecules present peptides of 8-10 residues to CD8+ T cells. We used 19 predicted proteomes to determine the influence of CD8+ T cell immune surveillance on protein evolution in humans and microbial pathogens by predicting immunopeptidomes, i.e., sets of class I binding peptides present in proteomes. We find that class I peptide binding specificities (i) have had little, if any, influence on the evolution of immunopeptidomes and (ii) do not take advantage of biases in amino acid distribution in proteins other than the concentration of hydrophobic residues in NH(2)-terminal leader sequences.
Collapse
|
166
|
Markstein M, Zinzen R, Markstein P, Yee KP, Erives A, Stathopoulos A, Levine M. A regulatory code for neurogenic gene expression in the Drosophila embryo. Development 2004; 131:2387-94. [PMID: 15128669 DOI: 10.1242/dev.01124] [Citation(s) in RCA: 114] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
Bioinformatics methods have identified enhancers that mediate restricted expression in the Drosophila embryo. However, only a small fraction of the predicted enhancers actually work when tested in vivo. In the present study, co-regulated neurogenic enhancers that are activated by intermediate levels of the Dorsal regulatory gradient are shown to contain several shared sequence motifs. These motifs permitted the identification of new neurogenic enhancers with high precision: five out of seven predicted enhancers direct restricted expression within ventral regions of the neurogenic ectoderm. Mutations in some of the shared motifs disrupt enhancer function, and evidence is presented that the Twist and Su(H) regulatory proteins are essential for the specification of the ventral neurogenic ectoderm prior to gastrulation. The regulatory model of neurogenic gene expression defined in this study permitted the identification of a neurogenic enhancer in the distant Anopheles genome. We discuss the prospects for deciphering regulatory codes that link primary DNA sequence information with predicted patterns of gene expression.
Collapse
Affiliation(s)
- Michele Markstein
- Department of Molecular and Cellular Biology, Division of Genetics and Development, 401 Barker Hall, University of California, Berkeley, CA 94720, USA
| | | | | | | | | | | | | |
Collapse
|
167
|
Bailey JA, Church DM, Ventura M, Rocchi M, Eichler EE. Analysis of segmental duplications and genome assembly in the mouse. Genome Res 2004; 14:789-801. [PMID: 15123579 PMCID: PMC479105 DOI: 10.1101/gr.2238404] [Citation(s) in RCA: 86] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Limited comparative studies suggest that the human genome is particularly enriched for recent segmental duplications. The extent of segmental duplications in other mammalian genomes is unknown and confounded by methodological differences in genome assembly. Here, we present a detailed analysis of recent duplication content within the mouse genome using a whole-genome assembly comparison method and a novel assembly independent method, designed to take advantage of the reduced allelic variation of the C57BL/6J strain. We conservatively estimate that approximately 57% of all highly identical segmental duplications (>or=90%) were misassembled or collapsed within the working draft WGS assembly. The WGS approach often leaves duplications fragmented and unassigned to a chromosome when compared with the clone-ordered-based approach. Our preliminary analysis suggests that 1.7%-2.0% of the mouse genome is part of recent large segmental duplications (about half of what is observed for the human genome). We have constructed a mouse segmental duplication database to aid in the characterization of these regions and their integration into the final mouse genome assembly. This work suggests significant biological differences in the architecture of recent segmental duplications between human and mouse. In addition, our unique method provides the means for improving whole-genome shotgun sequence assembly of mouse and future mammalian genomes.
Collapse
Affiliation(s)
- Jeffrey A Bailey
- Department of Genetics, Center for Computational Genomics, Case Western Reserve University School of Medicine and University Hospitals of Cleveland, Cleveland, Ohio 4410, USA
| | | | | | | | | |
Collapse
|
168
|
Castillo-Davis CI, Kondrashov FA, Hartl DL, Kulathinal RJ. The functional genomic distribution of protein divergence in two animal phyla: coevolution, genomic conflict, and constraint. Genome Res 2004; 14:802-11. [PMID: 15123580 PMCID: PMC479106 DOI: 10.1101/gr.2195604] [Citation(s) in RCA: 68] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
We compare the functional spectrum of protein evolution in two separate animal lineages with respect to two hypotheses: (1) rates of divergence are distributed similarly among functional classes within both lineages, indicating that selective pressure on the proteome is largely independent of organismic-level biological requirements; and (2) rates of divergence are distributed differently among functional classes within each lineage, indicating species-specific selective regimes impact genome-wide substitutional patterns. Integrating comparative genome sequence with data from tissue-specific expressed-sequence-tag (EST) libraries and detailed database annotations, we find a functional genomic signature of rapid evolution and selective constraint shared between mammalian and nematode lineages despite their extensive morphological and ecological differences and distant common ancestry. In both phyla, we find evidence of accelerated evolution among components of molecular systems involved in coevolutionary change. In mammals, lineage-specific fast evolving genes include those involved in reproduction, immunity, and possibly, maternal-fetal conflict. Likelihood ratio tests provide evidence for positive selection in these rapidly evolving functional categories in mammals. In contrast, slowly evolving genes, in terms of amino acid or insertion/deletion (indel) change, in both phyla are involved in core molecular processes such as transcription, translation, and protein transport. Thus, strong purifying selection appears to act on the same core cellular processes in both mammalian and nematode lineages, whereas positive and/or relaxed selection acts on different biological processes in each lineage.
Collapse
Affiliation(s)
- Cristian I Castillo-Davis
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts 02138, USA
| | | | | | | |
Collapse
|
169
|
Schattner P, Decatur WA, Davis CA, Ares M, Fournier MJ, Lowe TM. Genome-wide searching for pseudouridylation guide snoRNAs: analysis of the Saccharomyces cerevisiae genome. Nucleic Acids Res 2004; 32:4281-96. [PMID: 15306656 PMCID: PMC514388 DOI: 10.1093/nar/gkh768] [Citation(s) in RCA: 131] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2004] [Revised: 07/15/2004] [Accepted: 07/26/2004] [Indexed: 12/21/2022] Open
Abstract
One of the largest families of small RNAs in eukaryotes is the H/ACA small nucleolar RNAs (snoRNAs), most of which guide RNA pseudouridine formation. So far, an effective computational method specifically for identifying H/ACA snoRNA gene sequences has not been established. We have developed snoGPS, a program for computationally screening genomic sequences for H/ACA guide snoRNAs. The program implements a deterministic screening algorithm combined with a probabilistic model to score gene candidates. We report here the results of testing snoGPS on the budding yeast Saccharomyces cerevisiae. Six candidate snoRNAs were verified as novel RNA transcripts, and five of these were verified as guides for pseudouridine formation at specific sites in ribosomal RNA. We also predicted 14 new base-pairings between snoRNAs and known pseudouridine sites in S.cerevisiae rRNA, 12 of which were verified by gene disruption and loss of the cognate pseudouridine site. Our findings include the first prediction and verification of snoRNAs that guide pseudouridine modification at more than two sites. With this work, 41 of the 44 known pseudouridine modifications in S.cerevisiae rRNA have been linked with a verified snoRNA, providing the most complete accounting of the H/ACA snoRNAs that guide pseudouridylation in any species.
Collapse
MESH Headings
- Algorithms
- Base Sequence
- Computational Biology/methods
- Genome, Fungal
- Genomics/methods
- Molecular Sequence Data
- Phylogeny
- Pseudouridine/chemistry
- Pseudouridine/metabolism
- RNA, Fungal/chemistry
- RNA, Fungal/genetics
- RNA, Fungal/metabolism
- RNA, Ribosomal/chemistry
- RNA, Ribosomal/metabolism
- RNA, Small Nucleolar/chemistry
- RNA, Small Nucleolar/genetics
- RNA, Small Nucleolar/physiology
- Saccharomyces cerevisiae/genetics
- Saccharomyces cerevisiae/metabolism
- Software
- RNA, Small Untranslated
Collapse
Affiliation(s)
- Peter Schattner
- Department of Biomolecular Engineering, University of California, Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | | | | | | | | | | |
Collapse
|
170
|
Caenepeel S, Charydczak G, Sudarsanam S, Hunter T, Manning G. The mouse kinome: discovery and comparative genomics of all mouse protein kinases. Proc Natl Acad Sci U S A 2004; 101:11707-12. [PMID: 15289607 PMCID: PMC511041 DOI: 10.1073/pnas.0306880101] [Citation(s) in RCA: 231] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
We have determined the full protein kinase (PK) complement (kinome) of mouse. This set of 540 genes includes many novel kinases and corrections or extensions to >150 published sequences. The mouse has orthologs for 510 of the 518 human PKs. Nonorthologous kinases arise only by retrotransposition and gene decay. Orthologous kinase pairs vary in sequence conservation along their length, creating a map of functionally important regions for every kinase pair. Many species-specific sequence inserts exist and are frequently alternatively spliced, allowing for the creation of evolutionary lineage-specific functions. Ninety-seven kinase pseudogenes were found, all distinct from the 107 human kinase pseudogenes. Chromosomal mapping links 163 kinases to mutant phenotypes and unlocks the use of mouse genetics to determine functions of orthologous human kinases.
Collapse
Affiliation(s)
- Sean Caenepeel
- SUGEN, Incorporated, 230 East Grand Avenue, South San Francisco, CA 94025, USA
| | | | | | | | | |
Collapse
|
171
|
Bergstrom DE, Bergstrom RA, Munroe RJ, Lee BK, Browning VL, You Y, Eicher EM, Schimenti JC. Overlapping deletions spanning the proximal two-thirds of the mouse t complex. Mamm Genome 2004; 14:817-29. [PMID: 14724736 PMCID: PMC2583125 DOI: 10.1007/s00335-003-2298-4] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2003] [Accepted: 07/17/2003] [Indexed: 11/25/2022]
Abstract
Chromosome deletion complexes in model organisms serve as valuable genetic tools for the functional and physical annotation of complex genomes. Among their many roles, deletions can serve as mapping tools for simple or quantitative trait loci (QTLs), genetic reagents for regional mutagenesis experiments, and, in the case of mice, models of human contiguous gene deletion syndromes. Deletions also are uniquely suited for identifying regions of the genome containing haploinsufficient or imprinted loci. Here we describe the creation of new deletions at the proximal end of mouse Chromosome (Chr) 17 by using the technique of ES cell irradiation and the extensive molecular characterization of these and previously isolated deletions that, in total, cover much of the mouse t complex. The deletions are arranged in five overlapping complexes that collectively span about 25 Mbp. Furthermore, we have integrated each of the deletion complexes with physical data from public and private mouse genome sequences, and our own genetic data, to resolve some discrepancies. These deletions will be useful for characterizing several phenomena related to the t complex and t haplotypes, including transmission ratio distortion, male infertility, and the collection of t haplotype embryonic lethal mutations. The deletions will also be useful for mapping other loci of interest on proximal Chr 17, including T-associated sex reversal ( Tas) and head-tilt ( het). The new deletions have thus far been used to localize the recently identified t haplolethal ( Thl1) locus to an approximately 1.3-Mbp interval.
Collapse
Affiliation(s)
- David E Bergstrom
- The Jackson Laboratory, 600 Main Street, Bar Harbor, Maine 04609, USA
| | | | | | | | | | | | | | | |
Collapse
|
172
|
Kosak ST, Groudine M. Form follows function: The genomic organization of cellular differentiation. Genes Dev 2004; 18:1371-84. [PMID: 15198979 DOI: 10.1101/gad.1209304] [Citation(s) in RCA: 178] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
The extent to which the nucleus is functionally organized has broad biological implications. Evidence supports the idea that basic nuclear functions, such as transcription, are structurally integrated within the nucleus. Moreover, recent studies indicate that the linear arrangement of genes within eukaryotic genomes is nonrandom. We suggest that determining the relationship between nuclear organization and the linear arrangement of genes will lead to a greater understanding of how transcriptomes, dedicated to a particular cellular function or fate, are coordinately regulated. Current network theories may provide a useful framework for modeling the inherent complexity the functional organization of the nucleus.
Collapse
Affiliation(s)
- Steven T Kosak
- Division of Basic Sciences, Fred Hutchinson Cancer Research Center, Seattle, Washington 98109, USA
| | | |
Collapse
|
173
|
Emes RD, Riley MC, Laukaitis CM, Goodstadt L, Karn RC, Ponting CP. Comparative evolutionary genomics of androgen-binding protein genes. Genome Res 2004; 14:1516-29. [PMID: 15256509 PMCID: PMC509260 DOI: 10.1101/gr.2540304] [Citation(s) in RCA: 75] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Allelic variation within the mouse androgen-binding protein (ABP) alpha subunit gene (Abpa) has been suggested to promote assortative mating and thus prezygotic isolation. This is consistent with the elevated evolutionary rates observed for the Abpa gene, and the Abpb and Abpg genes whose products (ABPbeta and ABPgamma) form heterodimers with ABPalpha. We have investigated the mouse sequence that contains the three Abpa/b/g genes, and orthologous regions in rat, human, and chimpanzee genomes. Our studies reveal extensive "remodeling" of this region: Duplication rates of Abpa-like and Abpbg-like genes in mouse are >2 orders of magnitude higher than the average rate for all mouse genes; synonymous nucleotide substitution rates are twofold higher; and the Abpabg genomic region has expanded nearly threefold since divergence of the rodents. During this time, one in six amino acid sites in ABPbetagamma-like proteins appear to have been subject to positive selection; these may constitute a site of interaction with receptors or ligands. Greater adaptive variation among Abpbg-like sequences than among Abpa-like sequences suggests that assortative mating preferences are more influenced by variation in Abpbg-like genes. We propose a role for ABPalpha/beta/gamma proteins as pheromones, or in modulating odorant detection. This would account for the extraordinary adaptive evolution of these genes, and surrounding genomic regions, in murid rodents.
Collapse
Affiliation(s)
- Richard D Emes
- MRC Functional Genetics Unit, Department of Human Anatomy and Genetics, University of Oxford, Oxford OX1 3QX, United Kingdom
| | | | | | | | | | | |
Collapse
|
174
|
Cooper GM, Sidow A. Genomic regulatory regions: insights from comparative sequence analysis. Curr Opin Genet Dev 2004; 13:604-10. [PMID: 14638322 DOI: 10.1016/j.gde.2003.10.001] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Comparative sequence analysis is contributing to the identification and characterization of genomic regulatory regions with functional roles. It is effective because functionally important regions tend to evolve at a slower rate than do less important regions. The choice of species for comparative analysis is crucial: shared ancestry of a clade of species facilitates the discovery of genomic features important to that clade, whereas increased sequence divergence improves the resolution at which features can be discovered. Recent studies suggest that comparative analyses are useful for all branches of life and that, in the near future, large-scale mammalian comparative sequence analysis will provide the best approach for the comprehensive discovery of human regulatory elements.
Collapse
Affiliation(s)
- Gregory M Cooper
- Department of Genetics, Stanford University, Stanford, CA 94305-9010, USA
| | | |
Collapse
|
175
|
Yalcin B, Fullerton J, Miller S, Keays DA, Brady S, Bhomra A, Jefferson A, Volpi E, Copley RR, Flint J, Mott R. Unexpected complexity in the haplotypes of commonly used inbred strains of laboratory mice. Proc Natl Acad Sci U S A 2004; 101:9734-9. [PMID: 15210992 PMCID: PMC470780 DOI: 10.1073/pnas.0401189101] [Citation(s) in RCA: 86] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2004] [Indexed: 01/21/2023] Open
Abstract
Investigation of sequence variation in common inbred mouse strains has revealed a segmented pattern in which regions of high and low variant density are intermixed. Furthermore, it has been suggested that allelic strain distribution patterns also occur in well defined blocks and consequently could be used to map quantitative trait loci (QTL) in comparisons between inbred strains. We report a detailed analysis of polymorphism distribution in multiple inbred mouse strains over a 4.8-megabase region containing a QTL influencing anxiety. Our analysis indicates that it is only partly true that the genomes of inbred strains exist as a patchwork of segments of sequence identity and difference. We show that the definition of haplotype blocks is not robust and that methods for QTL mapping may fail if they assume a simple block-like structure.
Collapse
Affiliation(s)
- B Yalcin
- Wellcome Trust Centre for Human Genetics, Oxford University, Roosevelt Drive, Oxford OX3 7BN, United Kingdom
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
176
|
Abstract
Interpreting the functional content of a given genomic sequence is one of the central challenges of biology today. Perhaps the most promising approach to this problem is based on the comparative method of classic biology in the modern guise of sequence comparison. For instance, protein-coding regions tend to be conserved between species. Hence, a simple method for distinguishing a functional exon from the chance absence of stop codons is to investigate its homologue from closely related species. Predicting regulatory elements is even more difficult than exon prediction, but again, comparisons pinpointing conserved sequence motifs upstream of translation start sites are helping to unravel gene regulatory networks. In addition to interspecific studies, intraspecific sequence comparison yields insights into the evolutionary forces that have acted on a species in the past. Of particular interest here is the identification of selection events such as selective sweeps. Both intra- and interspecific sequence comparisons are based on a variety of computational methods, including alignment, phylogenetic reconstruction, and coalescent theory. This article surveys the biology and the central computational ideas applied in recent comparative genomics projects. We argue that the most fruitful method of understanding the functional content of genomes is to study them in the context of related genomic sequences. In particular, such a study may reveal selection, a fundamental pointer to biological relevance.
Collapse
Affiliation(s)
- Bernhard Haubold
- Fachbereich Biotechnologie & Bioinformatik, Fachhochschule Weihenstephan, 85350 Freising, Germany.
| | | |
Collapse
|
177
|
Beisel KW, Shiraki T, Morris KA, Pompeia C, Kachar B, Arakawa T, Bono H, Kawai J, Hayashizaki Y, Carninci P. Identification of unique transcripts from a mouse full-length, subtracted inner ear cDNA library. Genomics 2004; 83:1012-23. [PMID: 15177555 DOI: 10.1016/j.ygeno.2004.01.006] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2003] [Revised: 12/15/2003] [Accepted: 01/25/2004] [Indexed: 11/20/2022]
Abstract
A small-scale full-length library construction approach was developed to facilitate production of a mouse full-length cDNA encyclopedia representing approximately 250 enriched, normalized, and/or subtracted cDNA libraries. One library produced using this approach was a subtracted adult mouse inner ear cDNA library (sIEa). The average size of the inserts was approximately 2.5 kb, with the majority ranging from 0.5 to 7.0 kb. From this library 22,574 sequence reads were obtained from 15,958 independent clones. Sequencing and chromosomal localization established 5240 clusters, with 1302 clusters being unique and 359 representing new ESTs. Our sIEa library contributed 56.1% of the 7773 nonredundant Unigene clusters associated with the four mouse inner ear libraries in the NCBI dbEST. Based on homologous chromosomal regions between human and mouse, we identified 1018 UniGene clusters associated with the deafness locus critical regions. Of these, 59 clusters were found only in our sIEa library and represented approximately 50% of the identified critical regions.
Collapse
Affiliation(s)
- Kirk W Beisel
- Department of Biomedical Sciences, Creighton University, 2500 California, Omaha, NE 68178, USA.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
178
|
Conant GC, Wagner A. A fast algorithm for determining the best combination of local alignments to a query sequence. BMC Bioinformatics 2004; 5:62. [PMID: 15149555 PMCID: PMC436051 DOI: 10.1186/1471-2105-5-62] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2003] [Accepted: 05/18/2004] [Indexed: 11/24/2022] Open
Abstract
Background Existing sequence alignment algorithms assume that similarities between DNA or amino acid sequences are linearly ordered. That is, stretches of similar nucleotides or amino acids are in the same order in both sequences. Recombination perturbs this order. An algorithm that can reconstruct sequence similarity despite rearrangement would be helpful for reconstructing the evolutionary history of recombined sequences. Results We propose a graph-based algorithm for combining multiple local alignments to a query sequence into the single combination of alignments that either covers the maximal portion of the query or results in the single highest alignment score to the query. This algorithm can help study the process of genome rearrangement, improve functional gene annotation, and reconstruct the evolutionary history of recombined proteins. The algorithm takes O(n2) time, where n is the number of local alignments considered. Conclusions We discuss two example applications of the algorithm. The algorithm is able to provide useful reconstructions of the metazoan mitochondrial genome. It is also able to increase the percentage of a query sequence's amino acid residues for which similar stretches of amino acids can be found in sequence databases.
Collapse
Affiliation(s)
- Gavin C Conant
- Department of Biology, The University of New Mexico, Albuquerque, NM, USA
| | - Andreas Wagner
- Department of Biology, The University of New Mexico, Albuquerque, NM, USA
| |
Collapse
|
179
|
Richard F, Lombard M, Dutrillaux B. Reconstruction of the ancestral karyotype of eutherian mammals. Chromosome Res 2004; 11:605-18. [PMID: 14516069 DOI: 10.1023/a:1024957002755] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Applying the parsimony principle, i.e. that chromosomes identical in species belonging to different taxa were likely to be present in their common ancestor, the ancestral karyotype of eutherian mammals (about 100 million years old) was tentatively reconstructed. Comparing chromosome banding with all ZOO-FISH data from literature or studied by us, this reconstruction can be proposed with only limited uncertainties. This karyotype comprised 50 chromosomes of which 40-42 were acrocentrics. Ten ancestral pairs of chromosomes were homologous to a single human chromosome: 5, 6, 9, 11, 13, 17, 18, 20, X and Y (human nomenclature). Nine others were homologous to a part of a human chromosome: 1p + q (proximal), 1q, 2p + q (proximal), 2q, part of 7, 8q, 10p, 10q and 19p (human nomenclature). Finally, seven pairs of chromosomes, homologs to human chromosomes 3 + 21, 4 + 8p, part of 7 + 16p, part of 12 + part of 22 (twice), 14 + 15, 16q + 19q, formed syntenies disrupted in man.
Collapse
Affiliation(s)
- F Richard
- UMR 147 CNRS, Institut Curie, Section Recherche, 26 rue d'Ulm, 75248 Paris Cedex 05, France.
| | | | | |
Collapse
|
180
|
Branchi I, Bichler Z, Minghetti L, Delabar JM, Malchiodi-Albedi F, Gonzalez MC, Chettouh Z, Nicolini A, Chabert C, Smith DJ, Rubin EM, Migliore-Samour D, Alleva E. Transgenic mouse in vivo library of human Down syndrome critical region 1: association between DYRK1A overexpression, brain development abnormalities, and cell cycle protein alteration. J Neuropathol Exp Neurol 2004; 63:429-40. [PMID: 15198122 DOI: 10.1093/jnen/63.5.429] [Citation(s) in RCA: 70] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Down syndrome is the most frequent genetic cause of mental retardation, having an incidence of 1 in 700 live births. In the present study we used a transgenic mouse in vivo library consisting of 4 yeast artificial chromosome (YAC) transgenic mouse lines, each bearing a different fragment of the Down syndrome critical region 1 (DCR-1), implicated in brain abnormalities characterizing this pathology. The 152F7 fragment, in addition to genes also located on the other DCR-1 fragments, bears the DYRK1A gene, encoding for a serine-threonine kinase. The neurobehavioral analysis of these mouse lines showed that DYRK1A overexpressing 152F7 mice but not the other lines display learning impairment and hyperactivity during development. Additionally, 152F7 mice display increased brain weight and neuronal size. At a biochemical level we found DYRK1A overexpression associated with a development-dependent increase in phosphorylation of the transcription factor FKHR and with high levels of cyclin B1, suggesting for the first time in vivo a correlation between DYRK1A overexpression and cell cycle protein alteration. In addition, we found an altered phosphorylation of transcription factors of CREB family. Our findings support a role of DYRK1A overexpression in the neuronal abnormalities seen in Down syndrome and suggest that this pathology is linked to altered levels of proteins involved in the regulation of cell cycle.
Collapse
Affiliation(s)
- Igor Branchi
- Department of Cell Biology and Neuroscience, Istituto Superiore di Sanità, Rome, Italy.
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
181
|
Lange AW, Molkentin JD, Yutzey KE. DSCR1 gene expression is dependent on NFATc1 during cardiac valve formation and colocalizes with anomalous organ development in trisomy 16 mice. Dev Biol 2004; 266:346-60. [PMID: 14738882 DOI: 10.1016/j.ydbio.2003.10.036] [Citation(s) in RCA: 50] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
The Down syndrome critical region 1 (DSCR1) gene is present in the region of human chromosome 21 and the syntenic region of mouse chromosome 16, trisomy of which is associated with congenital heart defects observed in Down syndrome. DSCR1 encodes a regulatory protein in the calcineurin/NFAT signal transduction pathway. During valvuloseptal development in the heart, DSCR1 is expressed in the endocardium of the developing atrioventricular and semilunar valves, the muscular interventricular septum, and the ventricular myocardium. Human DSCR1 contains an NFAT-rich calcineurin-responsive element adjacent to exon 4. Transgenic mice generated with a homologous regulatory region of the mouse DSCR1 gene linked to lacZ (DSCR1(e4)/lacZ) show gene activation in the endocardium of the developing valves and aorticopulmonary septum of the heart, recapitulating a specific subdomain of endogenous DSCR1 cardiac expression. DSCR1(e4)/lacZ expression in the developing valve endocardium colocalizes with NFATc1 and, endocardial DSCR1(e4)/lacZ, is notably reduced or absent in NFATc1(-/-) embryos. Furthermore, expression of the endogenous DSCR1(e4) isoform is decreased in the outflow tract of NFATc1(-/-) hearts, and the DSCR1(e4) intragenic element is trans-activated by NFATc1 in cell culture. In trisomy 16 (Ts16) mice, expression of endogenous DSCR1 and DSCR1(e4)/lacZ colocalizes with anomalous valvuloseptal development, and transgenic Ts16 hearts have increased beta-galactosidase activity. DSCR1 and DSCR1(e4)/lacZ also are expressed in other organ systems affected by trisomy 16 in mice or trisomy 21 in humans including the brain, eye, ear, face, and limbs. Together, these results show that DSCR1(e4) expression in the developing valve endocardium is dependent on NFATc1 and support a role for DSCR1 in normal cardiac valvuloseptal formation as well as the abnormal development of several organ systems affected in individuals with Down syndrome.
Collapse
Affiliation(s)
- Alexander W Lange
- Division of Molecular Cardiovascular Biology, Children's Medical Center Cincinnati ML 7020, Cincinnati, OH 45229, USA
| | | | | |
Collapse
|
182
|
Gibbs RA, Weinstock GM, Metzker ML, Muzny DM, Sodergren EJ, Scherer S, Scott G, Steffen D, Worley KC, Burch PE, Okwuonu G, Hines S, Lewis L, DeRamo C, Delgado O, Dugan-Rocha S, Miner G, Morgan M, Hawes A, Gill R, Celera, Holt RA, Adams MD, Amanatides PG, Baden-Tillson H, Barnstead M, Chin S, Evans CA, Ferriera S, Fosler C, Glodek A, Gu Z, Jennings D, Kraft CL, Nguyen T, Pfannkoch CM, Sitter C, Sutton GG, Venter JC, Woodage T, Smith D, Lee HM, Gustafson E, Cahill P, Kana A, Doucette-Stamm L, Weinstock K, Fechtel K, Weiss RB, Dunn DM, Green ED, Blakesley RW, Bouffard GG, De Jong PJ, Osoegawa K, Zhu B, Marra M, Schein J, Bosdet I, Fjell C, Jones S, Krzywinski M, Mathewson C, Siddiqui A, Wye N, McPherson J, Zhao S, Fraser CM, Shetty J, Shatsman S, Geer K, Chen Y, Abramzon S, Nierman WC, Havlak PH, Chen R, Durbin KJ, Simons R, Ren Y, Song XZ, Li B, Liu Y, Qin X, Cawley S, Worley KC, Cooney AJ, D'Souza LM, Martin K, Wu JQ, Gonzalez-Garay ML, Jackson AR, Kalafus KJ, McLeod MP, Milosavljevic A, Virk D, Volkov A, Wheeler DA, Zhang Z, Bailey JA, Eichler EE, et alGibbs RA, Weinstock GM, Metzker ML, Muzny DM, Sodergren EJ, Scherer S, Scott G, Steffen D, Worley KC, Burch PE, Okwuonu G, Hines S, Lewis L, DeRamo C, Delgado O, Dugan-Rocha S, Miner G, Morgan M, Hawes A, Gill R, Celera, Holt RA, Adams MD, Amanatides PG, Baden-Tillson H, Barnstead M, Chin S, Evans CA, Ferriera S, Fosler C, Glodek A, Gu Z, Jennings D, Kraft CL, Nguyen T, Pfannkoch CM, Sitter C, Sutton GG, Venter JC, Woodage T, Smith D, Lee HM, Gustafson E, Cahill P, Kana A, Doucette-Stamm L, Weinstock K, Fechtel K, Weiss RB, Dunn DM, Green ED, Blakesley RW, Bouffard GG, De Jong PJ, Osoegawa K, Zhu B, Marra M, Schein J, Bosdet I, Fjell C, Jones S, Krzywinski M, Mathewson C, Siddiqui A, Wye N, McPherson J, Zhao S, Fraser CM, Shetty J, Shatsman S, Geer K, Chen Y, Abramzon S, Nierman WC, Havlak PH, Chen R, Durbin KJ, Simons R, Ren Y, Song XZ, Li B, Liu Y, Qin X, Cawley S, Worley KC, Cooney AJ, D'Souza LM, Martin K, Wu JQ, Gonzalez-Garay ML, Jackson AR, Kalafus KJ, McLeod MP, Milosavljevic A, Virk D, Volkov A, Wheeler DA, Zhang Z, Bailey JA, Eichler EE, Tuzun E, Birney E, Mongin E, Ureta-Vidal A, Woodwark C, Zdobnov E, Bork P, Suyama M, Torrents D, Alexandersson M, Trask BJ, Young JM, Huang H, Wang H, Xing H, Daniels S, Gietzen D, Schmidt J, Stevens K, Vitt U, Wingrove J, Camara F, Mar Albà M, Abril JF, Guigo R, Smit A, Dubchak I, Rubin EM, Couronne O, Poliakov A, Hübner N, Ganten D, Goesele C, Hummel O, Kreitler T, Lee YA, Monti J, Schulz H, Zimdahl H, Himmelbauer H, Lehrach H, Jacob HJ, Bromberg S, Gullings-Handley J, Jensen-Seaman MI, Kwitek AE, Lazar J, Pasko D, Tonellato PJ, Twigger S, Ponting CP, Duarte JM, Rice S, Goodstadt L, Beatson SA, Emes RD, Winter EE, Webber C, Brandt P, Nyakatura G, Adetobi M, Chiaromonte F, Elnitski L, Eswara P, Hardison RC, Hou M, Kolbe D, Makova K, Miller W, Nekrutenko A, Riemer C, Schwartz S, Taylor J, Yang S, Zhang Y, Lindpaintner K, Andrews TD, Caccamo M, Clamp M, Clarke L, Curwen V, Durbin R, Eyras E, Searle SM, Cooper GM, Batzoglou S, Brudno M, Sidow A, Stone EA, Venter JC, Payseur BA, Bourque G, López-Otín C, Puente XS, Chakrabarti K, Chatterji S, Dewey C, Pachter L, Bray N, Yap VB, Caspi A, Tesler G, Pevzner PA, Haussler D, Roskin KM, Baertsch R, Clawson H, Furey TS, Hinrichs AS, Karolchik D, Kent WJ, Rosenbloom KR, Trumbower H, Weirauch M, Cooper DN, Stenson PD, Ma B, Brent M, Arumugam M, Shteynberg D, Copley RR, Taylor MS, Riethman H, Mudunuri U, Peterson J, Guyer M, Felsenfeld A, Old S, Mockrin S, Collins F. Genome sequence of the Brown Norway rat yields insights into mammalian evolution. Nature 2004; 428:493-521. [PMID: 15057822 DOI: 10.1038/nature02426] [Show More Authors] [Citation(s) in RCA: 1557] [Impact Index Per Article: 74.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2003] [Accepted: 02/20/2004] [Indexed: 01/16/2023]
Abstract
The laboratory rat (Rattus norvegicus) is an indispensable tool in experimental medicine and drug development, having made inestimable contributions to human health. We report here the genome sequence of the Brown Norway (BN) rat strain. The sequence represents a high-quality 'draft' covering over 90% of the genome. The BN rat sequence is the third complete mammalian genome to be deciphered, and three-way comparisons with the human and mouse genomes resolve details of mammalian evolution. This first comprehensive analysis includes genes and proteins and their relation to human disease, repeated sequences, comparative genome-wide studies of mammalian orthologous chromosomal regions and rearrangement breakpoints, reconstruction of ancestral karyotypes and the events leading to existing species, rates of variation, and lineage-specific and lineage-independent evolutionary events such as expansion of gene families, orthology relations and protein evolution.
Collapse
Affiliation(s)
- Richard A Gibbs
- Human Genome Sequencing Center, Department of Molecular and Human Genetics, Baylor College of Medicine, MS BCM226, One Baylor Plaza, Houston, Texas 77030, USA. http://www.hgsc.bcm.tmc.edu
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
183
|
Ianzano L, Young EJ, Zhao XC, Chan EM, Rodriguez MT, Torrado MV, Scherer SW, Minassian BA. Loss of function of the cytoplasmic isoform of the protein laforin (EPM2A) causes Lafora progressive myoclonus epilepsy. Hum Mutat 2004; 23:170-176. [PMID: 14722920 DOI: 10.1002/humu.10306] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Lafora disease is the most severe teenage-onset progressive epilepsy, a unique form of glycogenosis with perikaryal accumulation of an abnormal form of glycogen, and a neurodegenerative disorder exhibiting an unusual generalized organellar disintegration. The disease is caused by mutations of the EPM2A gene, which encodes two isoforms of the laforin protein tyrosine phosphatase, having alternate carboxyl termini, one localized in the cytoplasm (endoplasmic reticulum) and the other in the nucleus. To date, all documented disease mutations, including the knockout mouse model deletion, have been in the segment of the protein common to both isoforms. It is therefore not known whether dysfunction of the cytoplasmic, nuclear, or both isoforms leads to the disease. In the present work, we identify six novel mutations, one of which, c.950insT (Q319fs), is the first mutation specific to the cytoplasmic laforin isoform, implicating this isoform in disease pathogenesis. To confirm this mutation's deleterious effect on laforin, we studied the resultant protein's subcellular localization and function and show a drastic reduction in its phosphatase activity, despite maintenance of its location at the endoplasmic reticulum.
Collapse
Affiliation(s)
- Leonarda Ianzano
- Department of Genetics and Genomic Biology, The Hospital for Sick Children, Toronto, Canada
| | - Edwin J Young
- Department of Genetics and Genomic Biology, The Hospital for Sick Children, Toronto, Canada
| | - Xiao C Zhao
- Department of Genetics and Genomic Biology, The Hospital for Sick Children, Toronto, Canada
| | - Elayne M Chan
- Department of Genetics and Genomic Biology, The Hospital for Sick Children, Toronto, Canada
- Department of Medical Genetics and Microbiology, University of Toronto, Toronto, Canada
| | - M T Rodriguez
- Pediatric Department, Narciso Lopez Hospital Lanus, Buenos Aires, Argentina
| | - Maria V Torrado
- M.V. Department of Genetics, National Pediatric Hospital Dr. Juan P. Garrahan, Buenos Aires, Argentina
| | - Stephen W Scherer
- Department of Genetics and Genomic Biology, The Hospital for Sick Children, Toronto, Canada
| | - Berge A Minassian
- Division of Neurology, Department of Paediatrics, The Hospital for Sick Children, Toronto, Canada
| |
Collapse
|
184
|
Istrail S, Sutton GG, Florea L, Halpern AL, Mobarry CM, Lippert R, Walenz B, Shatkay H, Dew I, Miller JR, Flanigan MJ, Edwards NJ, Bolanos R, Fasulo D, Halldorsson BV, Hannenhalli S, Turner R, Yooseph S, Lu F, Nusskern DR, Shue BC, Zheng XH, Zhong F, Delcher AL, Huson DH, Kravitz SA, Mouchard L, Reinert K, Remington KA, Clark AG, Waterman MS, Eichler EE, Adams MD, Hunkapiller MW, Myers EW, Venter JC. Whole-genome shotgun assembly and comparison of human genome assemblies. Proc Natl Acad Sci U S A 2004; 101:1916-21. [PMID: 14769938 PMCID: PMC357027 DOI: 10.1073/pnas.0307971100] [Citation(s) in RCA: 119] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
We report a whole-genome shotgun assembly (called WGSA) of the human genome generated at Celera in 2001. The Celera-generated shotgun data set consisted of 27 million sequencing reads organized in pairs by virtue of end-sequencing 2-kbp, 10-kbp, and 50-kbp inserts from shotgun clone libraries. The quality-trimmed reads covered the genome 5.3 times, and the inserts from which pairs of reads were obtained covered the genome 39 times. With the nearly complete human DNA sequence [National Center for Biotechnology Information (NCBI) Build 34] now available, it is possible to directly assess the quality, accuracy, and completeness of WGSA and of the first reconstructions of the human genome reported in two landmark papers in February 2001 [Venter, J. C., Adams, M. D., Myers, E. W., Li, P. W., Mural, R. J., Sutton, G. G., Smith, H. O., Yandell, M., Evans, C. A., Holt, R. A., et al. (2001) Science 291, 1304-1351; International Human Genome Sequencing Consortium (2001) Nature 409, 860-921]. The analysis of WGSA shows 97% order and orientation agreement with NCBI Build 34, where most of the 3% of sequence out of order is due to scaffold placement problems as opposed to assembly errors within the scaffolds themselves. In addition, WGSA fills some of the remaining gaps in NCBI Build 34. The early genome sequences all covered about the same amount of the genome, but they did so in different ways. The Celera results provide more order and orientation, and the consortium sequence provides better coverage of exact and nearly exact repeats.
Collapse
Affiliation(s)
- Sorin Istrail
- Applied Biosystems, 45 West Gude Drive, Rockville, MD 20850, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
185
|
Frazer KA, Tao H, Osoegawa K, de Jong PJ, Chen X, Doherty MF, Cox DR. Noncoding sequences conserved in a limited number of mammals in the SIM2 interval are frequently functional. Genome Res 2004; 14:367-72. [PMID: 14962988 PMCID: PMC353216 DOI: 10.1101/gr.1961204] [Citation(s) in RCA: 60] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Cross-species DNA sequence comparison is a fundamental method for identifying biologically important elements, because functional sequences are evolutionarily conserved, wheres nonfunctional sequences drift. A recent genome-wide comparison of human and mouse DNA discovered over 200,000 conserved noncoding sequences with unknown function. Multispecies DNA comparison has been proposed as a method to prioritize these conserved noncoding sequences for functional analysis based on the hypothesis that elements present in many species are more likely to be functional than elements present in limited numbers of species. Here, we perform a comparative analysis of the single-minded 2 (SIM2) gene interval on human chromosome 21 with horse, cow, pig, dog, cat, and mouse DNA. We classify conserved sequences based on the number of mammals in which they are present, and experimentally test sequences in each class for function. As hypothesized, conserved sequences present in many mammals are frequently functional. Additionally, we demonstrate that sequences conserved in a limited number of mammals are also frequently functional. Examination of genomic deletions in chimpanzee and rhesus macaque DNA showed that several putatively functional conserved noncoding human sequences were absent in these primates. These findings suggest that functional conserved noncoding human sequences can be missing in other mammals, even closely related primate species.
Collapse
MESH Headings
- Animals
- Basic Helix-Loop-Helix Transcription Factors
- Cats
- Cattle
- Chromosome Deletion
- Chromosomes, Artificial, Bacterial/genetics
- Chromosomes, Human, Pair 21/genetics
- Cloning, Molecular
- Computational Biology/methods
- Conserved Sequence/genetics
- Conserved Sequence/physiology
- DNA, Intergenic/classification
- DNA, Intergenic/genetics
- DNA, Intergenic/physiology
- Dogs
- Evolution, Molecular
- Horses/genetics
- Humans
- Macaca mulatta/genetics
- Mice
- Pan troglodytes/genetics
- Regulatory Sequences, Nucleic Acid
- Sequence Homology, Nucleic Acid
- Swine/genetics
- Transcription Factors/classification
- Transcription Factors/genetics
- Transcription Factors/physiology
Collapse
Affiliation(s)
- Kelly A Frazer
- Perlegen Sciences, Mountain View, California 95051, USA.
| | | | | | | | | | | | | |
Collapse
|
186
|
Cavener JD, Cull P, Holloway JL, Hsu TC. Walking tree heuristics for comparative genomic alignments. Math Biosci 2004; 188:207-19. [PMID: 14766103 DOI: 10.1016/j.mbs.2003.07.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2002] [Revised: 07/02/2003] [Accepted: 07/22/2003] [Indexed: 11/24/2022]
Abstract
Genomic sequence data is available for an ever-increasing number of organisms, but the full meaning of this data remains an enigma. String alignment is one approach for deciphering the information contained in genetic strings. Sequences which are conserved across species will help identify genes and other important structures. Similarity between species can be scored by measuring how well their sequences align. The walking tree method is an approximate string alignment method that can handle insertions, deletions, substitutions, translocations, and more than one level of inversion. We will describe this method and recent improvements which allow fast alignment of megabase strings. We will show examples in which the method located or discovered genes. We show how the method can be used to construct phylogenetic trees. We also show that the method can be used to identify essential regions for protein function.
Collapse
Affiliation(s)
- Jeffrey D Cavener
- Computer Science, Oregon State University, Corvallis, OR 97339, USA.
| | | | | | | |
Collapse
|
187
|
Okon EB, Szado T, Laher I, McManus B, van Breemen C. Augmented Contractile Response of Vascular Smooth Muscle in a Diabetic Mouse Model. J Vasc Res 2004; 40:520-30. [PMID: 14646372 DOI: 10.1159/000075238] [Citation(s) in RCA: 83] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2003] [Accepted: 08/25/2003] [Indexed: 11/19/2022] Open
Abstract
The vasomotor properties of isolated aortae and mesenteric arteries of insulin-resistant ob/ob and 57CBL/6J mice were compared in organ bath studies. Vessels from ob/ob mice were more sensitive to phenylephrine. Pretreatment with L-NAME caused similar leftward shifts of the phenylephrine concentration response curves in diabetic and non-diabetic vessels. The ob/ob aortae contracted in response to phenylephrine with roughly twice the force while they were not stiffer than control aortae. L-NAME caused a greater percentage increase in maximal force in the control than in the ob/ob tissue. Denudation potentiated force in the control aortae, but not in the ob/ob aortae. Endothelium-dependent relaxation in the ob/ob aortae and mesenteric arteries was impaired as manifested by a decreased sensitivity and maximal relaxation to acetylcholine, while the aortic basal eNOS mRNA levels did not differ between the two strains. In addition, ob/ob aortae were less sensitive to the nitric oxide donor sodium nitroprusside. Inhibition of endogenous prostaglandin synthesis with indomethacin (10 microM) partly normalized the contractile response of the ob/ob aortae and enhanced their endothelium-dependent relaxation. Neither blockade of endothelin-1 receptors (bosentan, 10 microM) nor PKC inhibition (calphostin, 1 microM) affected the contractile response to phenylephrine in the mouse aortae of either strain. In conclusion, vascular dysfunction in the aorta and mesenteric artery of ob/ob mice are due to increased smooth muscle contractility and impaired dilation but not to changes in elasticity of the vascular wall. Endothelium-produced prostaglandins contribute to the increased vasoconstriction.
Collapse
MESH Headings
- Animals
- Aorta/cytology
- Aorta/physiology
- Cell Count
- Diabetes Mellitus, Type 2/metabolism
- Diabetes Mellitus, Type 2/physiopathology
- Disease Models, Animal
- Elasticity
- Glucose/metabolism
- Lipid Metabolism
- Mesenteric Arteries/physiology
- Mice
- Mice, Inbred C57BL
- Mice, Obese
- Muscle Contraction/physiology
- Muscle, Smooth, Vascular/cytology
- Muscle, Smooth, Vascular/physiology
- Myocytes, Smooth Muscle/cytology
- Myocytes, Smooth Muscle/metabolism
- Nitric Oxide/metabolism
- Nitric Oxide Synthase/genetics
- Nitric Oxide Synthase Type II
- Nitric Oxide Synthase Type III
- Protein Kinase C/metabolism
- Proteoglycans/metabolism
- RNA, Messenger/analysis
- Receptors, Endothelin/metabolism
Collapse
Affiliation(s)
- Elena B Okon
- iCAPTUR(4)E Centre and Department of Pathology, University of British Columbia, Vancouver, Canada.
| | | | | | | | | |
Collapse
|
188
|
Karnovsky AM, Gotow LF, McKinley DD, Piechan JL, Ruble CL, Mills CJ, Schellin KAB, Slightom JL, Fitzgerald LR, Benjamin CW, Roberds SL. A cluster of novel serotonin receptor 3-like genes on human chromosome 3. Gene 2004; 319:137-48. [PMID: 14597179 DOI: 10.1016/s0378-1119(03)00803-5] [Citation(s) in RCA: 68] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
The ligand-gated ion channel family includes receptors for serotonin (5-hydroxytryptamine, 5-HT), acetylcholine, GABA, and glutamate. Drugs targeting subtypes of these receptors have proven useful for the treatment of various neuropsychiatric and neurological disorders. To identify new ligand-gated ion channels as potential therapeutic targets, drafts of human genome sequence were interrogated. Portions of four novel genes homologous to 5-HT(3A) and 5-HT(3B) receptors were identified within human sequence databases. We named the genes 5-HT(3C1)-5-HT(3C4). Radiation hybrid (RH) mapping localized these genes to chromosome 3q27-28. All four genes shared similar intron-exon organizations and predicted protein secondary structure with 5-HT(3A) and 5-HT(3B). Orthologous genes were detected by Southern blotting in several species including dog, cow, and chicken, but not in rodents, suggesting that these novel genes are not present in rodents or are very poorly conserved. Two of the novel genes are predicted to be pseudogenes, but two other genes are transcribed and spliced to form appropriate open reading frames. The 5-HT(3C1) transcript is expressed almost exclusively in small intestine and colon, suggesting a possible role in the serotonin-responsiveness of the gut.
Collapse
Affiliation(s)
- Alla M Karnovsky
- Department of Bioinformatics, Pharmacia Corporation, Kalamazoo, MI 49007, USA
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
189
|
Abstract
The accurate prediction of higher eukaryotic gene structures and regulatory elements directly from genomic sequences is an important early step in the understanding of newly assembled contigs and finished genomes. As more new genomes are sequenced, comparative approaches are becoming increasingly practical and valuable for predicting genes and regulatory elements. We demonstrate the effectiveness of a comparative method called pattern filtering; it utilizes synteny between two or more genomic segments for the annotation of genomic sequences. Pattern filtering optimally detects the signatures of conserved functional elements despite the stochastic noise inherent in evolutionary processes, allowing more accurate annotation of gene models. We anticipate that pattern filtering will facilitate sequence annotation and the discovery of new functional elements by the genetics and genomics communities.
Collapse
Affiliation(s)
- Jonathan E Moore
- Molecular Biology Institute, University of California Los Angeles, Los Angeles, CA 90095, USA
| | | |
Collapse
|
190
|
Clark AG, Glanowski S, Nielsen R, Thomas PD, Kejariwal A, Todd MA, Tanenbaum DM, Civello D, Lu F, Murphy B, Ferriera S, Wang G, Zheng X, White TJ, Sninsky JJ, Adams MD, Cargill M. Inferring nonneutral evolution from human-chimp-mouse orthologous gene trios. Science 2003; 302:1960-3. [PMID: 14671302 DOI: 10.1126/science.1088821] [Citation(s) in RCA: 471] [Impact Index Per Article: 21.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
Abstract
Even though human and chimpanzee gene sequences are nearly 99% identical, sequence comparisons can nevertheless be highly informative in identifying biologically important changes that have occurred since our ancestral lineages diverged. We analyzed alignments of 7645 chimpanzee gene sequences to their human and mouse orthologs. These three-species sequence alignments allowed us to identify genes undergoing natural selection along the human and chimp lineage by fitting models that include parameters specifying rates of synonymous and nonsynonymous nucleotide substitution. This evolutionary approach revealed an informative set of genes with significantly different patterns of substitution on the human lineage compared with the chimpanzee and mouse lineages. Partitions of genes into inferred biological classes identified accelerated evolution in several functional classes, including olfaction and nuclear transport. In addition to suggesting adaptive physiological differences between chimps and humans, human-accelerated genes are significantly more likely to underlie major known Mendelian disorders.
Collapse
Affiliation(s)
- Andrew G Clark
- Molecular Biology and Genetics, Cornell University, Ithaca, NY 14853, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
191
|
A combinatorial network of evolutionarily conserved myelin basic protein regulatory sequences confers distinct glial-specific phenotypes. J Neurosci 2003. [PMID: 14614079 DOI: 10.1523/jneurosci.23-32-10214.2003] [Citation(s) in RCA: 59] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Myelin basic protein (MBP) is required for normal myelin compaction and is implicated in both experimental and human demyelinating diseases. In this study, as an initial step in defining the regulatory network controlling MBP transcription, we located and characterized the function of evolutionarily conserved regulatory sequences. Long-range human-mouse sequence comparison revealed over 1 kb of conserved noncoding MBP 5' flanking sequence distributed into four widely spaced modules ranging from 0.1 to 0.4 kb. We demonstrate first that a controlled strategy of transgenesis provides an effective means to assign and compare qualitative and quantitative in vivo regulatory programs. Using this strategy, single-copy reporter constructs, designed to evaluate the regulatory significance of modular and intermodular sequences, were introduced by homologous recombination into the mouse hprt (hypoxanthine-guanine phosphoribosyltransferase) locus. The proximal modules M1 and M2 confer comparatively low-level oligodendrocyte expression primarily limited to early postnatal development, whereas the upstream M3 confers high-level oligodendrocyte expression extending throughout maturity. Furthermore, constructs devoid of M3 fail to target expression to newly myelinating oligodendrocytes in the mature CNS. Mutation of putative Nkx6.2/Gtx sites within M3, although not eliminating oligodendrocyte targeting, significantly decreases transgene expression levels. High-level and continuous expression is conferred to myelinating or remyelinating Schwann cells by M4. In addition, when isolated from surrounding MBP sequences, M3 confers transient expression to Schwann cells elaborating myelin. These observations define the in vivo regulatory roles played by conserved noncoding MBP sequences and lead to a combinatorial model in which different regulatory modules are engaged during primary myelination, myelin maintenance, and remyelination.
Collapse
|
192
|
Abstract
The sequencing of eukaryotic genomes has lagged behind sequencing of organisms in the other domains of life, archae and bacteria, primarily due to their greater size and complexity. With recent advances in high-throughput technologies such as robotics and improved computational resources, the number of eukaryotic genome sequencing projects has increased significantly. Among these are a number of sequencing projects of tropical pathogens of medical and veterinary importance, many of which are responsible for causing widespread morbidity and mortality in peoples of developing countries. Uncovering the complete gene complement of these organisms is proving to be of immense value in the development of novel methods of parasite control, such as antiparasitic drugs and vaccines, as well as the development of new diagnostic tools. Combining pathogen genome sequences with the host and vector genome sequences is promising to be a robust method for the identification of host-pathogen interactions. Finally, comparative sequencing of related species, especially of organisms used as model systems in the study of the disease, is beginning to realize its potential in the identification of genes, and the evolutionary forces that shape the genes, that are involved in evasion of the host immune response.
Collapse
Affiliation(s)
- Jane M Carlton
- The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, MD 20850, USA.
| |
Collapse
|
193
|
Dermitzakis ET, Reymond A, Scamuffa N, Ucla C, Kirkness E, Rossier C, Antonarakis SE. Evolutionary Discrimination of Mammalian Conserved Non-Genic Sequences (CNGs). Science 2003; 302:1033-5. [PMID: 14526086 DOI: 10.1126/science.1087047] [Citation(s) in RCA: 137] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
Analysis of the human and mouse genomes identified an abundance of conserved non-genic sequences (CNGs). The significance and evolutionary depth of their conservation remain unanswered. We have quantified levels and patterns of conservation of 191 CNGs of human chromosome 21 in 14 mammalian species. We found that CNGs are significantly more conserved than protein-coding genes and noncoding RNAS (ncRNAs) within the mammalian class from primates to monotremes to marsupials. The pattern of substitutions in CNGs differed from that seen in protein-coding and ncRNA genes and resembled that of protein-binding regions. About 0.3% to 1% of the human genome corresponds to a previously unknown class of extremely constrained CNGs shared among mammals.
Collapse
Affiliation(s)
- Emmanouil T Dermitzakis
- Division of Medical Genetics and National Center of Competence in Research (NCCR) Frontiers in Genetics, University of Geneva Medical School and University Hospitals, 1211 Geneva, Switzerland.
| | | | | | | | | | | | | |
Collapse
|
194
|
Murphy WJ, Bourque G, Tesler G, Pevzner P, O'Brien SJ. Reconstructing the genomic architecture of mammalian ancestors using multispecies comparative maps. Hum Genomics 2003; 1:30-40. [PMID: 15601531 PMCID: PMC3525001 DOI: 10.1186/1479-7364-1-1-30] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2003] [Accepted: 08/19/2003] [Indexed: 11/10/2022] Open
Abstract
Rapidly developing comparative gene maps in selected mammal species are providing an opportunity to reconstruct the genomic architecture of mammalian ancestors and study rearrangements that transformed this ancestral genome into existing mammalian genomes. Here, the recently developed Multiple Genome Rearrangement (MGR) algorithm is applied to human, mouse, cat and cattle comparative maps (with 311-470 shared markers) to impute the ancestral mammalian genome. Reconstructed ancestors consist of 70-100 conserved segments shared across the genomes that have been exchanged by rearrangement events along the ordinal lineages leading to modern species genomes. Genomic distances between species, dominated by inversions (reversals) and translocations, are presented in a first multispecies attempt using ordered mapping data to reconstruct the evolutionary exchanges that preceded modern placental mammal genomes.
Collapse
Affiliation(s)
- William J Murphy
- Laboratory of Genomic Diversity, National Cancer Institute, Frederick, MD 21702, USA.
| | | | | | | | | |
Collapse
|
195
|
Abstract
The genomes of human, mouse, and rat have been sequenced. Now, as O'Brien and Murphy announce in their Perspective, the genome sequence derby is heating up with the addition of dog to the list (Kirkness et al.). As they explain, even though the coverage of the dog genome (1.5x) is lower than that of mouse (8x), there are many valuable insights to be gained from comparing the sequence of dog with those of mouse and human.
Collapse
Affiliation(s)
- Stephen J O'Brien
- Laboratory of Genomic Diversity, National Cancer Institute, Frederick, MD 21702, USA
| | | |
Collapse
|
196
|
van Driel R, Fransz PF, Verschure PJ. The eukaryotic genome: a system regulated at different hierarchical levels. J Cell Sci 2003; 116:4067-75. [PMID: 12972500 DOI: 10.1242/jcs.00779] [Citation(s) in RCA: 120] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open
Abstract
Eukaryotic gene expression can be viewed within a conceptual framework in which regulatory mechanisms are integrated at three hierarchical levels. The first is the sequence level, i.e. the linear organization of transcription units and regulatory sequences. Here, developmentally co-regulated genes seem to be organized in clusters in the genome, which constitute individual functional units. The second is the chromatin level, which allows switching between different functional states. Switching between a state that suppresses transcription and one that is permissive for gene activity probably occurs at the level of the gene cluster, involving changes in chromatin structure that are controlled by the interplay between histone modification, DNA methylation, and a variety of repressive and activating mechanisms. This regulatory level is combined with control mechanisms that switch individual genes in the cluster on and off, depending on the properties of the promoter. The third level is the nuclear level, which includes the dynamic 3D spatial organization of the genome inside the cell nucleus. The nucleus is structurally and functionally compartmentalized and epigenetic regulation of gene expression may involve repositioning of loci in the nucleus through changes in large-scale chromatin structure.
Collapse
Affiliation(s)
- Roel van Driel
- Swammerdam Institute for Life Sciences, BioCentrum Amsterdam, University of Amsterdam, Kruislaan 318,1098SM Amsterdam, The Netherlands.
| | | | | |
Collapse
|
197
|
Gardiner K, Fortna A, Bechtel L, Davisson MT. Mouse models of Down syndrome: how useful can they be? Comparison of the gene content of human chromosome 21 with orthologous mouse genomic regions. Gene 2003; 318:137-47. [PMID: 14585506 DOI: 10.1016/s0378-1119(03)00769-8] [Citation(s) in RCA: 157] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
With an incidence of approximately 1 in 700 live births, Down syndrome (DS) remains the most common genetic cause of mental retardation. The phenotype is assumed to be due to overexpression of some number of the >300 genes encoded by human chromosome 21. Mouse models, in particular the chromosome 16 segmental trisomies, Ts65Dn and Ts1Cje, are indispensable for DS-related studies of gene-phenotype correlations. Here we compare the updated gene content of the finished sequence of human chromosome 21 (364 genes and putative genes) with the gene content of the homologous mouse genomic regions (291 genes and putative genes) obtained from annotation of the public sector C57Bl/6 draft sequence. Annotated genes fall into one of three classes. First, there are 170 highly conserved, human/mouse orthologues. Second, there are 83 minimally conserved, possible orthologues. Included among the conserved and minimally conserved genes are 31 antisense transcripts. Third, there are species-specific genes: 111 spliced human transcripts show no orthologues in the syntenic mouse regions although 13 have homologous sequences elsewhere in the mouse genomic sequence, and 38 spliced mouse transcripts show no identifiable human orthologues. While these species-specific genes are largely based solely on spliced EST data, a majority can be verified in RNA expression experiments. In addition, preliminary data suggest that many human-specific transcripts may represent a novel class of primate-specific genes. Lastly, updated functional annotation of orthologous genes indicates genes encoding components of several cellular pathways are dispersed throughout the orthologous mouse chromosomal regions and are not completely represented in the Down syndrome segmental mouse models. Together, these data point out the potential for existing mouse models to produce extraneous phenotypes and to fail to produce DS-relevant phenotypes.
Collapse
Affiliation(s)
- Katheleen Gardiner
- Eleanor Roosevelt Institute at the University of Denver, 1899 Gaylord Street, Denver, CO 80206-1210, USA.
| | | | | | | |
Collapse
|
198
|
Liquori CL, Ikeda Y, Weatherspoon M, Ricker K, Schoser BGH, Dalton JC, Day JW, Ranum LPW. Myotonic dystrophy type 2: human founder haplotype and evolutionary conservation of the repeat tract. Am J Hum Genet 2003; 73:849-62. [PMID: 14505273 PMCID: PMC1180607 DOI: 10.1086/378720] [Citation(s) in RCA: 67] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2003] [Accepted: 07/18/2003] [Indexed: 01/22/2023] Open
Abstract
Myotonic dystrophy (DM), the most common form of muscular dystrophy in adults, can be caused by a mutation on either chromosome 19 (DM1) or 3 (DM2). In 2001, we demonstrated that DM2 is caused by a CCTG expansion in intron 1 of the zinc finger protein 9 (ZNF9) gene. To investigate the ancestral origins of the DM2 expansion, we compared haplotypes for 71 families with genetically confirmed DM2, using 19 short tandem repeat markers that we developed that flank the repeat tract. All of the families are white, with the majority of Northern European/German descent and a single family from Afghanistan. Several conserved haplotypes spanning >700 kb appear to converge into a single haplotype near the repeat tract. The common interval that is shared by all families with DM2 immediately flanks the repeat, extending up to 216 kb telomeric and 119 kb centromeric of the CCTG expansion. The DM2 repeat tract contains the complex repeat motif (TG)(n)(TCTG)(n)(CCTG)(n). The CCTG portion of the repeat tract is interrupted on normal alleles, but, as in other expansion disorders, these interruptions are lost on affected alleles. We examined haplotypes of 228 control chromosomes and identified a potential premutation allele with an uninterrupted (CCTG)(20) on a haplotype that was identical to the most common affected haplotype. Our data suggest that the predominant Northern European ancestry of families with DM2 resulted from a common founder and that the loss of interruptions within the CCTG portion of the repeat tract may predispose alleles to further expansion. To gain insight into possible function of the repeat tract, we looked for evolutionary conservation. The complex repeat motif and flanking sequences within intron 1 are conserved among human, chimpanzee, gorilla, mouse, and rat, suggesting a conserved biological function.
Collapse
Affiliation(s)
- Christina L. Liquori
- Institute of Human Genetics, Departments of Genetics, Cell Biology, and Development, and Neurology, University of Minnesota, Minneapolis; Department of Neurology, University of Würzburg, Würzburg, Germany; and Friedrich-Baur-Institute, Department of Neurology, Ludwig-Maximilians-University, Munich
| | - Yoshio Ikeda
- Institute of Human Genetics, Departments of Genetics, Cell Biology, and Development, and Neurology, University of Minnesota, Minneapolis; Department of Neurology, University of Würzburg, Würzburg, Germany; and Friedrich-Baur-Institute, Department of Neurology, Ludwig-Maximilians-University, Munich
| | - Marcy Weatherspoon
- Institute of Human Genetics, Departments of Genetics, Cell Biology, and Development, and Neurology, University of Minnesota, Minneapolis; Department of Neurology, University of Würzburg, Würzburg, Germany; and Friedrich-Baur-Institute, Department of Neurology, Ludwig-Maximilians-University, Munich
| | - Kenneth Ricker
- Institute of Human Genetics, Departments of Genetics, Cell Biology, and Development, and Neurology, University of Minnesota, Minneapolis; Department of Neurology, University of Würzburg, Würzburg, Germany; and Friedrich-Baur-Institute, Department of Neurology, Ludwig-Maximilians-University, Munich
| | - Benedikt G. H. Schoser
- Institute of Human Genetics, Departments of Genetics, Cell Biology, and Development, and Neurology, University of Minnesota, Minneapolis; Department of Neurology, University of Würzburg, Würzburg, Germany; and Friedrich-Baur-Institute, Department of Neurology, Ludwig-Maximilians-University, Munich
| | - Joline C. Dalton
- Institute of Human Genetics, Departments of Genetics, Cell Biology, and Development, and Neurology, University of Minnesota, Minneapolis; Department of Neurology, University of Würzburg, Würzburg, Germany; and Friedrich-Baur-Institute, Department of Neurology, Ludwig-Maximilians-University, Munich
| | - John W. Day
- Institute of Human Genetics, Departments of Genetics, Cell Biology, and Development, and Neurology, University of Minnesota, Minneapolis; Department of Neurology, University of Würzburg, Würzburg, Germany; and Friedrich-Baur-Institute, Department of Neurology, Ludwig-Maximilians-University, Munich
| | - Laura P. W. Ranum
- Institute of Human Genetics, Departments of Genetics, Cell Biology, and Development, and Neurology, University of Minnesota, Minneapolis; Department of Neurology, University of Würzburg, Würzburg, Germany; and Friedrich-Baur-Institute, Department of Neurology, Ludwig-Maximilians-University, Munich
| |
Collapse
|
199
|
Kirkness EF, Bafna V, Halpern AL, Levy S, Remington K, Rusch DB, Delcher AL, Pop M, Wang W, Fraser CM, Venter JC. The Dog Genome: Survey Sequencing and Comparative Analysis. Science 2003; 301:1898-903. [PMID: 14512627 DOI: 10.1126/science.1086432] [Citation(s) in RCA: 349] [Impact Index Per Article: 15.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
A survey of the dog genome sequence (6.22 million sequence reads; 1.5x coverage) demonstrates the power of sample sequencing for comparative analysis of mammalian genomes and the generation of species-specific resources. More than 650 million base pairs (>25%) of dog sequence align uniquely to the human genome, including fragments of putative orthologs for 18,473 of 24,567 annotated human genes. Mutation rates, conserved synteny, repeat content, and phylogeny can be compared among human, mouse, and dog. A variety of polymorphic elements are identified that will be valuable for mapping the genetic basis of diseases and traits in the dog.
Collapse
Affiliation(s)
- Ewen F Kirkness
- The Institute for Genomic Research, Rockville, MD 20850, USA
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
200
|
Mi H, Vandergriff J, Campbell M, Narechania A, Majoros W, Lewis S, Thomas PD, Ashburner M. Assessment of genome-wide protein function classification for Drosophila melanogaster. Genome Res 2003; 13:2118-28. [PMID: 12952880 PMCID: PMC403707 DOI: 10.1101/gr.771603] [Citation(s) in RCA: 37] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
The functional classification of genes on a genome-wide scale is now in its infancy, and we make a first attempt to assess existing methods and identify sources of error. To this end, we compared two independent efforts for associating proteins with functions, one implemented by FlyBase and the other by PANTHER at Celera Genomics. Both methods make inferences based on sequence similarity and the available experimental evidence. However, they differ considerably in methodology and process. Overall, assuming that the systematic error across the two methods is relatively small, we find the protein-to-function association error rate of both the FlyBase and PANTHER methods to be <2%. The primary source of error for both methods appears to be simple human error. Although homology-based inference can certainly cause errors in annotation, our analysis indicates that the frequency of such errors is relatively small compared with the number of correct inferences. Moreover, these homology errors can be minimized by careful tree-based inference, such as that implemented in PANTHER. Often, functional associations are made by one method and not the other, indicating that one of the greatest challenges lies in improving the completeness of available ontology associations.
Collapse
Affiliation(s)
- Huaiyu Mi
- Protein Informatics, Celera Genomics, Foster City, California 94404, USA
| | | | | | | | | | | | | | | |
Collapse
|