1201
|
Yeo GW, Van Nostrand E, Holste D, Poggio T, Burge CB. Identification and analysis of alternative splicing events conserved in human and mouse. Proc Natl Acad Sci U S A 2005; 102:2850-5. [PMID: 15708978 PMCID: PMC548664 DOI: 10.1073/pnas.0409742102] [Citation(s) in RCA: 215] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2004] [Indexed: 12/29/2022] Open
Abstract
Alternative pre-mRNA splicing affects a majority of human genes and plays important roles in development and disease. Alternative splicing (AS) events conserved since the divergence of human and mouse are likely of primary biological importance, but relatively few of such events are known. Here we describe sequence features that distinguish exons subject to evolutionarily conserved AS, which we call alternative conserved exons (ACEs), from other orthologous human/mouse exons and integrate these features into an exon classification algorithm, acescan. Genome-wide analysis of annotated orthologous human-mouse exon pairs identified approximately 2,000 predicted ACEs. Alternative splicing was verified in both human and mouse tissues by using an RT-PCR-sequencing protocol for 21 of 30 (70%) predicted ACEs tested, supporting the validity of a majority of acescan predictions. By contrast, AS was observed in mouse tissues for only 2 of 15 (13%) tested exons that had EST or cDNA evidence of AS in human but were not predicted ACEs, and AS was never observed for 11 negative control exons in human or mouse tissues. Predicted ACEs were much more likely to preserve the reading frame and less likely to disrupt protein domains than other AS events and were enriched in genes expressed in the brain and in genes involved in transcriptional regulation, RNA processing, and development. Our results also imply that the vast majority of AS events represented in the human EST database are not conserved in mouse.
Collapse
Affiliation(s)
- Gene W Yeo
- Department of Biology and Center for Biological and Computational Learning, Massachusetts Institute of Technology, Cambridge, MA 02319, USA
| | | | | | | | | |
Collapse
|
1202
|
Washietl S, Hofacker IL, Stadler PF. Fast and reliable prediction of noncoding RNAs. Proc Natl Acad Sci U S A 2005; 102:2454-9. [PMID: 15665081 PMCID: PMC548974 DOI: 10.1073/pnas.0409169102] [Citation(s) in RCA: 467] [Impact Index Per Article: 23.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2004] [Indexed: 01/22/2023] Open
Abstract
We report an efficient method for detecting functional RNAs. The approach, which combines comparative sequence analysis and structure prediction, already has yielded excellent results for a small number of aligned sequences and is suitable for large-scale genomic screens. It consists of two basic components: (i) a measure for RNA secondary structure conservation based on computing a consensus secondary structure, and (ii) a measure for thermodynamic stability, which, in the spirit of a z score, is normalized with respect to both sequence length and base composition but can be calculated without sampling from shuffled sequences. Functional RNA secondary structures can be identified in multiple sequence alignments with high sensitivity and high specificity. We demonstrate that this approach is not only much more accurate than previous methods but also significantly faster. The method is implemented in the program rnaz, which can be downloaded from www.tbi.univie.ac.at/~wash/RNAz. We screened all alignments of length n > or = 50 in the Comparative Regulatory Genomics database, which compiles conserved noncoding elements in upstream regions of orthologous genes from human, mouse, rat, Fugu, and zebrafish. We recovered all of the known noncoding RNAs and cis-acting elements with high significance and found compelling evidence for many other conserved RNA secondary structures not described so far to our knowledge.
Collapse
Affiliation(s)
- Stefan Washietl
- Department of Theoretical Chemistry and Structural Biology, University of Vienna, Währingerstrasse 17, A-1090 Wien, Austria
| | | | | |
Collapse
|
1203
|
Dermitzakis ET, Reymond A, Antonarakis SE. Conserved non-genic sequences — an unexpected feature of mammalian genomes. Nat Rev Genet 2005; 6:151-7. [PMID: 15716910 DOI: 10.1038/nrg1527] [Citation(s) in RCA: 192] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
Mammalian genomes contain highly conserved sequences that are not functionally transcribed. These sequences are single copy and comprise approximately 1-2% of the human genome. Evolutionary analysis strongly supports their functional conservation, although their potentially diverse, functional attributes remain unknown. It is likely that genomic variation in conserved non-genic sequences is associated with phenotypic variability and human disorders. So how might their function and contribution to human disorders be examined?
Collapse
Affiliation(s)
- Emmanouil T Dermitzakis
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.
| | | | | |
Collapse
|
1204
|
Mani A, Radhakrishnan J, Farhi A, Carew KS, Warnes CA, Nelson-Williams C, Day RW, Pober B, State MW, Lifton RP. Syndromic patent ductus arteriosus: evidence for haploinsufficient TFAP2B mutations and identification of a linked sleep disorder. Proc Natl Acad Sci U S A 2005; 102:2975-9. [PMID: 15684060 PMCID: PMC549488 DOI: 10.1073/pnas.0409852102] [Citation(s) in RCA: 58] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
Patent ductus arteriosus (PDA) is a common congenital heart disease that results when the ductus arteriosus, a muscular artery, fails to remodel and close after birth. A syndromic form of this disorder, Char syndrome, is caused by mutation in TFAP2B, the gene encoding a neural crest-derived transcription factor. Established features of the syndrome are PDA, facial dysmorphology, and fifth-finger clinodactyly. Disease-causing mutations are missense and are proposed to be dominant negative. Because only a small number of families have been reported, there is limited information on the spectrum of mutations and resulting phenotypes. We report the characterization of two kindreds (K144 and K145) with Char syndrome containing 22 and 5 affected members, respectively. Genotyping revealed linkage to TFAP2B in both families. Sequencing of TFAP2B demonstrated mutations in both kindreds that were not found among control chromosomes. Both mutations altered highly conserved bases in introns required for normal splicing as demonstrated by biochemical studies in mammalian cells. The abnormal splicing results in mRNAs containing frameshift mutations that are expected to be degraded by nonsense-mediated mRNA decay, resulting in haploinsufficiency; even if produced, the protein in K144 would lack DNA binding and dimerization motifs and would likely result in haploinsufficiency. Examination of these two kindreds for phenotypes that segregate with TFAP2B mutations identified several phenotypes not previously linked to Char syndrome. These include parasomnia and dental and occipital-bone abnormalities. The striking sleep disorder in these kindreds implicates TFAP2B-dependent functions in the normal regulation of sleep.
Collapse
Affiliation(s)
- Arya Mani
- Department of Medicine, Howard Hughes Medical Institute and Yale University School of Medicine, New Haven, CT 06510, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
1205
|
Evidence for widespread degradation of gene control regions in hominid genomes. PLoS Biol 2005; 3:e42. [PMID: 15678168 PMCID: PMC544929 DOI: 10.1371/journal.pbio.0030042] [Citation(s) in RCA: 153] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2004] [Accepted: 12/01/2004] [Indexed: 01/28/2023] Open
Abstract
Although sequences containing regulatory elements located close to protein-coding genes are often only weakly conserved during evolution, comparisons of rodent genomes have implied that these sequences are subject to some selective constraints. Evolutionary conservation is particularly apparent upstream of coding sequences and in first introns, regions that are enriched for regulatory elements. By comparing the human and chimpanzee genomes, we show here that there is almost no evidence for conservation in these regions in hominids. Furthermore, we show that gene expression is diverging more rapidly in hominids than in murids per unit of neutral sequence divergence. By combining data on polymorphism levels in human noncoding DNA and the corresponding human–chimpanzee divergence, we show that the proportion of adaptive substitutions in these regions in hominids is very low. It therefore seems likely that the lack of conservation and increased rate of gene expression divergence are caused by a reduction in the effectiveness of natural selection against deleterious mutations because of the low effective population sizes of hominids. This has resulted in the accumulation of a large number of deleterious mutations in sequences containing gene control elements and hence a widespread degradation of the genome during the evolution of humans and chimpanzees. A comparison of hominid and rodent lineages reveals that the gene control regions of hominids are not conserved and are accumulating mutations, suggesting widespread degradation of the hominid genome
Collapse
|
1206
|
Manson FDC, Trump D, Read AP, Black GCM. Inherited eye disease: cause and late effect. Trends Mol Med 2005; 11:449-55. [PMID: 16153893 DOI: 10.1016/j.molmed.2005.08.001] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2005] [Revised: 07/12/2005] [Accepted: 08/16/2005] [Indexed: 01/14/2023]
Abstract
Molecular genetics has provided relatively few insights into late-onset eye disorders, but epidemiological data indicate that genetic factors are important in some late-onset eye disorders that cause major health burdens. Much clinical genetic research is based on the belief that developmental and late-onset disorders are not necessarily the result of defects in different genes, but are often caused by different mutations in the same collection of genes. Thus, mutations that either abolish or radically change gene function might cause early-onset disorders, whereas more-subtle changes in gene expression might underlie late-onset diseases. We present arguments and examples that indicate that this principle might be a fruitful guide to investigating the causes of late-onset eye disorders.
Collapse
Affiliation(s)
- Forbes D C Manson
- Academic Unit of Eye and Vision Science, Manchester Royal Eye Hospital, School of Medicine, University of Manchester, Oxford Road, Manchester M13 9WH, UK
| | | | | | | |
Collapse
|
1207
|
Hazkani-Covo E, Wool D, Graur D. In search of the vertebrate phylotypic stage: A molecular examination of the developmental hourglass model and von Baer's third law. JOURNAL OF EXPERIMENTAL ZOOLOGY PART B-MOLECULAR AND DEVELOPMENTAL EVOLUTION 2005; 304:150-8. [PMID: 15779077 DOI: 10.1002/jez.b.21033] [Citation(s) in RCA: 56] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
In 1828, Karl von Baer proposed a set of four evolutionary "laws" pertaining to embryological development. According to von Baer's third law, young embryos from different species are relatively undifferentiated and resemble one another but as development proceeds, distinguishing features of the species begin to appear and embryos of different species progressively diverge from one another. An expansion of this law, called "the hourglass model," has been proposed independently by Denis Duboule and Rudolf Raff in the 1990s. According to the hourglass model, ontogeny is characterized by a starting point at which different taxa differ markedly from one another, followed by a stage of reduced intertaxonomic variability (the phylotypic stage), and ending in a von-Baer-like progressive divergence among the taxa. A possible "translation" of the hourglass model into molecular terminology would suggest that orthologs expressed in stages described by the tapered part of the hourglass should resemble one another more than orthologs expressed in the expansive parts that precede or succeed the phylotypic stage. We tested this hypothesis using 1,585 mouse genes expressed during 26 embryonic stages, and their human orthologs. Evolutionary divergence was estimated at different embryonic stages by calculating pairwise distances between corresponding orthologous proteins from mouse and human. Two independent datasets were used. One dataset contained genes that are expressed solely in a single developmental stage; the second was made of genes expressed at different developmental stages. In the second dataset the genes were classified according to their earliest stage of expression. We fitted second order polynomials to the two datasets. The two polynomials displayed minima as expected from the hourglass model. The molecular results suggest, albeit weakly, that a phylotypic stage (or period) indeed exists. Its temporal location, sometimes between the first-somites stage and the formation of the posterior neuropore, was in approximate agreement with the morphologically defined phylotypic stage. The molecular evidence for the later parts of the hourglass model, i.e., for von Baer's third law, was stronger than that for the earlier parts.
Collapse
Affiliation(s)
- Einat Hazkani-Covo
- Department of Zoology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Ramat Aviv 69978, Israel
| | | | | |
Collapse
|
1208
|
García J, Castrillo JL. Identification of two novel human genes, DIPLA1 and DIPAS, expressed in placenta tissue. Gene 2005; 344:241-50. [PMID: 15656990 DOI: 10.1016/j.gene.2004.10.004] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2004] [Revised: 09/25/2004] [Accepted: 10/05/2004] [Indexed: 11/18/2022]
Abstract
Here we report the identification and expression analysis of two novel human genes--DIPLA1 (Differentially expressed in placenta 1) and DIPAS (DIPLA1 Antisense). These genes are located at chromosomal region 9q33.1, in opposite orientations, and are flanked by the pregnancy-associated plasma protein-A (PAPP-A) and astrotactin 2 (ASTN2) genes. The mRNA sequences of both genes contain several upstream AUGs (uAUG) and various potential open reading frames (ORFs). DIPLA1 mRNA is 1.8 kb long and contains a 285 nt ORF coding for a polypeptide designated as replicative senescence up-regulated (RSU) protein. Antisense DIPAS mRNA is 2.7 kb long and contains a 309 nt ORF coding for a protein with partial similitude to the gamma isoform variant of the human Ca(2+)/calmodulin (CaM)-dependent protein kinase II. Both genes are conserved in placental-species and are presumably transcribed from initiator (Inr) promoter elements located at opposite strands. In 20 human normal tissues tested, DIPLA1 mRNA expression was placenta-specific, whereas DIPAS mRNA expression was higher in placenta, brain, kidney and testis. In addition, DIPAS mRNA hybridizes with the 3'UTR region from PAPP-A mRNA, which spans over 4 kb more than previously reported, forming a potential sense-antisense double stranded RNA (dsRNA) duplex. Our results are of interest for placenta gene expression regulation and for the identification of novel genes in the human genome.
Collapse
MESH Headings
- Alternative Splicing
- Blotting, Northern
- Cell Line
- Cell Line, Tumor
- Chromosomes, Human, Pair 9/genetics
- DNA, Complementary/chemistry
- DNA, Complementary/genetics
- Endoplasmic Reticulum/metabolism
- Female
- Gene Expression Profiling
- Genes, Overlapping/genetics
- Green Fluorescent Proteins/genetics
- Green Fluorescent Proteins/metabolism
- HeLa Cells
- Humans
- Male
- Microscopy, Confocal
- Molecular Sequence Data
- Placenta/metabolism
- Pregnancy
- Pregnancy Proteins/genetics
- Pregnancy Proteins/metabolism
- Pregnancy-Associated Plasma Protein-A/genetics
- Pregnancy-Associated Plasma Protein-A/metabolism
- RNA, Messenger/genetics
- RNA, Messenger/metabolism
- Recombinant Fusion Proteins/genetics
- Recombinant Fusion Proteins/metabolism
- Sequence Analysis, DNA
- Transfection
Collapse
Affiliation(s)
- Job García
- Centro de Biología Molecular Severo Ochoa, Consejo Superior de Investigaciones Científicas, Universidad Autónoma de Madrid, Cantoblanco E-28049 Madrid, Spain
| | | |
Collapse
|
1209
|
Woolfe A, Goodson M, Goode DK, Snell P, McEwen GK, Vavouri T, Smith SF, North P, Callaway H, Kelly K, Walter K, Abnizova I, Gilks W, Edwards YJK, Cooke JE, Elgar G. Highly conserved non-coding sequences are associated with vertebrate development. PLoS Biol 2005; 3:e7. [PMID: 15630479 PMCID: PMC526512 DOI: 10.1371/journal.pbio.0030007] [Citation(s) in RCA: 685] [Impact Index Per Article: 34.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2004] [Accepted: 10/21/2004] [Indexed: 02/06/2023] Open
Abstract
In addition to protein coding sequence, the human genome contains a significant amount of regulatory DNA, the identification of which is proving somewhat recalcitrant to both in silico and functional methods. An approach that has been used with some success is comparative sequence analysis, whereby equivalent genomic regions from different organisms are compared in order to identify both similarities and differences. In general, similarities in sequence between highly divergent organisms imply functional constraint. We have used a whole-genome comparison between humans and the pufferfish, Fugu rubripes, to identify nearly 1,400 highly conserved non-coding sequences. Given the evolutionary divergence between these species, it is likely that these sequences are found in, and furthermore are essential to, all vertebrates. Most, and possibly all, of these sequences are located in and around genes that act as developmental regulators. Some of these sequences are over 90% identical across more than 500 bases, being more highly conserved than coding sequence between these two species. Despite this, we cannot find any similar sequences in invertebrate genomes. In order to begin to functionally test this set of sequences, we have used a rapid in vivo assay system using zebrafish embryos that allows tissue-specific enhancer activity to be identified. Functional data is presented for highly conserved non-coding sequences associated with four unrelated developmental regulators (SOX21, PAX6, HLXB9, and SHH), in order to demonstrate the suitability of this screen to a wide range of genes and expression patterns. Of 25 sequence elements tested around these four genes, 23 show significant enhancer activity in one or more tissues. We have identified a set of non-coding sequences that are highly conserved throughout vertebrates. They are found in clusters across the human genome, principally around genes that are implicated in the regulation of development, including many transcription factors. These highly conserved non-coding sequences are likely to form part of the genomic circuitry that uniquely defines vertebrate development.
Collapse
Affiliation(s)
- Adam Woolfe
- 1Medical Research Council Rosalind Franklin Centre for Genomics ResearchHinxton, CambridgeUnited Kingdom
| | - Martin Goodson
- 1Medical Research Council Rosalind Franklin Centre for Genomics ResearchHinxton, CambridgeUnited Kingdom
| | - Debbie K Goode
- 1Medical Research Council Rosalind Franklin Centre for Genomics ResearchHinxton, CambridgeUnited Kingdom
| | - Phil Snell
- 1Medical Research Council Rosalind Franklin Centre for Genomics ResearchHinxton, CambridgeUnited Kingdom
| | - Gayle K McEwen
- 1Medical Research Council Rosalind Franklin Centre for Genomics ResearchHinxton, CambridgeUnited Kingdom
| | - Tanya Vavouri
- 1Medical Research Council Rosalind Franklin Centre for Genomics ResearchHinxton, CambridgeUnited Kingdom
| | - Sarah F Smith
- 1Medical Research Council Rosalind Franklin Centre for Genomics ResearchHinxton, CambridgeUnited Kingdom
| | - Phil North
- 1Medical Research Council Rosalind Franklin Centre for Genomics ResearchHinxton, CambridgeUnited Kingdom
| | - Heather Callaway
- 1Medical Research Council Rosalind Franklin Centre for Genomics ResearchHinxton, CambridgeUnited Kingdom
| | - Krys Kelly
- 1Medical Research Council Rosalind Franklin Centre for Genomics ResearchHinxton, CambridgeUnited Kingdom
| | - Klaudia Walter
- 2Medical Research Council Biostatistics Unit, Institute of Public Health, Addenbrookes HospitalCambridgeUnited Kingdom
| | - Irina Abnizova
- 2Medical Research Council Biostatistics Unit, Institute of Public Health, Addenbrookes HospitalCambridgeUnited Kingdom
| | - Walter Gilks
- 2Medical Research Council Biostatistics Unit, Institute of Public Health, Addenbrookes HospitalCambridgeUnited Kingdom
| | - Yvonne J. K Edwards
- 1Medical Research Council Rosalind Franklin Centre for Genomics ResearchHinxton, CambridgeUnited Kingdom
| | - Julie E Cooke
- 1Medical Research Council Rosalind Franklin Centre for Genomics ResearchHinxton, CambridgeUnited Kingdom
| | - Greg Elgar
- 1Medical Research Council Rosalind Franklin Centre for Genomics ResearchHinxton, CambridgeUnited Kingdom
| |
Collapse
|
1210
|
Arrays of ultraconserved non-coding regions span the loci of key developmental genes in vertebrate genomes. BMC Genomics 2004; 5:99. [PMID: 15613238 PMCID: PMC544600 DOI: 10.1186/1471-2164-5-99] [Citation(s) in RCA: 228] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2004] [Accepted: 12/21/2004] [Indexed: 01/29/2023] Open
Abstract
Background Evolutionarily conserved sequences within or adjoining orthologous genes often serve as critical cis-regulatory regions. Recent studies have identified long, non-coding genomic regions that are perfectly conserved between human and mouse, termed ultra-conserved regions (UCRs). Here, we focus on UCRs that cluster around genes involved in early vertebrate development; genes conserved over 450 million years of vertebrate evolution. Results Based on a high resolution detection procedure, our UCR set enables novel insights into vertebrate genome organization and regulation of developmentally important genes. We find that the genomic positions of deeply conserved UCRs are strongly associated with the locations of genes encoding key regulators of development, with particularly strong positional correlation to transcription factor-encoding genes. Of particular importance is the observation that most UCRs are clustered into arrays that span hundreds of kilobases around their presumptive target genes. Such a hallmark signature is present around several uncharacterized human genes predicted to encode developmentally important DNA-binding proteins. Conclusion The genomic organization of UCRs, combined with previous findings, suggests that UCRs act as essential long-range modulators of gene expression. The exceptional sequence conservation and clustered structure suggests that UCR-mediated molecular events involve greater complexity than traditional DNA binding by transcription factors. The high-resolution UCR collection presented here provides a wealth of target sequences for future experimental studies to determine the nature of the biochemical mechanisms involved in the preservation of arrays of nearly identical non-coding sequences over the course of vertebrate evolution.
Collapse
|
1211
|
Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution. Nature 2004; 432:695-716. [PMID: 15592404 DOI: 10.1038/nature03154] [Citation(s) in RCA: 1999] [Impact Index Per Article: 95.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2004] [Accepted: 11/01/2004] [Indexed: 12/28/2022]
Abstract
We present here a draft genome sequence of the red jungle fowl, Gallus gallus. Because the chicken is a modern descendant of the dinosaurs and the first non-mammalian amniote to have its genome sequenced, the draft sequence of its genome--composed of approximately one billion base pairs of sequence and an estimated 20,000-23,000 genes--provides a new perspective on vertebrate genome evolution, while also improving the annotation of mammalian genomes. For example, the evolutionary distance between chicken and human provides high specificity in detecting functional elements, both non-coding and coding. Notably, many conserved non-coding sequences are far from genes and cannot be assigned to defined functional classes. In coding regions the evolutionary dynamics of protein domains and orthologous groups illustrate processes that distinguish the lineages leading to birds and mammals. The distinctive properties of avian microchromosomes, together with the inferred patterns of conserved synteny, provide additional insights into vertebrate chromosome architecture.
Collapse
|
1212
|
Martin J, Han C, Gordon LA, Terry A, Prabhakar S, She X, Xie G, Hellsten U, Chan YM, Altherr M, Couronne O, Aerts A, Bajorek E, Black S, Blumer H, Branscomb E, Brown NC, Bruno WJ, Buckingham JM, Callen DF, Campbell CS, Campbell ML, Campbell EW, Caoile C, Challacombe JF, Chasteen LA, Chertkov O, Chi HC, Christensen M, Clark LM, Cohn JD, Denys M, Detter JC, Dickson M, Dimitrijevic-Bussod M, Escobar J, Fawcett JJ, Flowers D, Fotopulos D, Glavina T, Gomez M, Gonzales E, Goodstein D, Goodwin LA, Grady DL, Grigoriev I, Groza M, Hammon N, Hawkins T, Haydu L, Hildebrand CE, Huang W, Israni S, Jett J, Jewett PB, Kadner K, Kimball H, Kobayashi A, Krawczyk MC, Leyba T, Longmire JL, Lopez F, Lou Y, Lowry S, Ludeman T, Manohar CF, Mark GA, McMurray KL, Meincke LJ, Morgan J, Moyzis RK, Mundt MO, Munk AC, Nandkeshwar RD, Pitluck S, Pollard M, Predki P, Parson-Quintana B, Ramirez L, Rash S, Retterer J, Ricke DO, Robinson DL, Rodriguez A, Salamov A, Saunders EH, Scott D, Shough T, Stallings RL, Stalvey M, Sutherland RD, Tapia R, Tesmer JG, Thayer N, Thompson LS, Tice H, Torney DC, Tran-Gyamfi M, Tsai M, Ulanovsky LE, et alMartin J, Han C, Gordon LA, Terry A, Prabhakar S, She X, Xie G, Hellsten U, Chan YM, Altherr M, Couronne O, Aerts A, Bajorek E, Black S, Blumer H, Branscomb E, Brown NC, Bruno WJ, Buckingham JM, Callen DF, Campbell CS, Campbell ML, Campbell EW, Caoile C, Challacombe JF, Chasteen LA, Chertkov O, Chi HC, Christensen M, Clark LM, Cohn JD, Denys M, Detter JC, Dickson M, Dimitrijevic-Bussod M, Escobar J, Fawcett JJ, Flowers D, Fotopulos D, Glavina T, Gomez M, Gonzales E, Goodstein D, Goodwin LA, Grady DL, Grigoriev I, Groza M, Hammon N, Hawkins T, Haydu L, Hildebrand CE, Huang W, Israni S, Jett J, Jewett PB, Kadner K, Kimball H, Kobayashi A, Krawczyk MC, Leyba T, Longmire JL, Lopez F, Lou Y, Lowry S, Ludeman T, Manohar CF, Mark GA, McMurray KL, Meincke LJ, Morgan J, Moyzis RK, Mundt MO, Munk AC, Nandkeshwar RD, Pitluck S, Pollard M, Predki P, Parson-Quintana B, Ramirez L, Rash S, Retterer J, Ricke DO, Robinson DL, Rodriguez A, Salamov A, Saunders EH, Scott D, Shough T, Stallings RL, Stalvey M, Sutherland RD, Tapia R, Tesmer JG, Thayer N, Thompson LS, Tice H, Torney DC, Tran-Gyamfi M, Tsai M, Ulanovsky LE, Ustaszewska A, Vo N, White PS, Williams AL, Wills PL, Wu JR, Wu K, Yang J, Dejong P, Bruce D, Doggett NA, Deaven L, Schmutz J, Grimwood J, Richardson P, Rokhsar DS, Eichler EE, Gilna P, Lucas SM, Myers RM, Rubin EM, Pennacchio LA. The sequence and analysis of duplication-rich human chromosome 16. Nature 2004; 432:988-94. [PMID: 15616553 DOI: 10.1038/nature03187] [Show More Authors] [Citation(s) in RCA: 121] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2004] [Accepted: 11/15/2004] [Indexed: 01/30/2023]
Abstract
Human chromosome 16 features one of the highest levels of segmentally duplicated sequence among the human autosomes. We report here the 78,884,754 base pairs of finished chromosome 16 sequence, representing over 99.9% of its euchromatin. Manual annotation revealed 880 protein-coding genes confirmed by 1,670 aligned transcripts, 19 transfer RNA genes, 341 pseudogenes and three RNA pseudogenes. These genes include metallothionein, cadherin and iroquois gene families, as well as the disease genes for polycystic kidney disease and acute myelomonocytic leukaemia. Several large-scale structural polymorphisms spanning hundreds of kilobase pairs were identified and result in gene content differences among humans. Whereas the segmental duplications of chromosome 16 are enriched in the relatively gene-poor pericentromere of the p arm, some are involved in recent gene duplication and conversion events that are likely to have had an impact on the evolution of primates and human disease susceptibility.
Collapse
Affiliation(s)
- Joel Martin
- DOE Joint Genome Institute, 2800 Mitchell Avenue, Walnut Creek, California 94598, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
1213
|
Blanchette M, Green ED, Miller W, Haussler D. Reconstructing large regions of an ancestral mammalian genome in silico. Genome Res 2004; 14:2412-23. [PMID: 15574820 PMCID: PMC534665 DOI: 10.1101/gr.2800104] [Citation(s) in RCA: 91] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2004] [Accepted: 10/05/2004] [Indexed: 11/25/2022]
Abstract
It is believed that most modern mammalian lineages arose from a series of rapid speciation events near the Cretaceous-Tertiary boundary. It is shown that such a phylogeny makes the common ancestral genome sequence an ideal target for reconstruction. Simulations suggest that with methods currently available, we can expect to get 98% of the bases correct in reconstructing megabase-scale euchromatic regions of an eutherian ancestral genome from the genomes of approximately 20 optimally chosen modern mammals. Using actual genomic sequences from 19 extant mammals, we reconstruct 1.1 Mb of ancient genome sequence around the CFTR locus. Detailed examination suggests the reconstruction is accurate and that it allows us to identify features in modern species, such as remnants of ancient transposon insertions, that were not identified by direct analysis. Tracing the predicted evolutionary history of the bases in the reconstructed region, estimates are made of the amount of DNA turnover due to insertion, deletion, and substitution in the different placental mammalian lineages since the common eutherian ancestor, showing considerable variation between lineages. In coming years, such reconstructions may help in identifying and understanding the genetic features common to eutherian mammals and may shed light on the evolution of human or primate-specific traits.
Collapse
Affiliation(s)
- Mathieu Blanchette
- Howard Hughes Medical Institute, University of California, Santa Cruz, California 95064, USA.
| | | | | | | |
Collapse
|
1214
|
Nóbrega MA, Zhu Y, Plajzer-Frick I, Afzal V, Rubin EM. Megabase deletions of gene deserts result in viable mice. Nature 2004; 431:988-93. [PMID: 15496924 DOI: 10.1038/nature03022] [Citation(s) in RCA: 133] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2004] [Accepted: 09/08/2004] [Indexed: 12/24/2022]
Abstract
The functional importance of the roughly 98% of mammalian genomes not corresponding to protein coding sequences remains largely undetermined. Here we show that some large-scale deletions of the non-coding DNA referred to as gene deserts can be well tolerated by an organism. We deleted two large non-coding intervals, 1,511 kilobases and 845 kilobases in length, from the mouse genome. Viable mice homozygous for the deletions were generated and were indistinguishable from wild-type littermates with regard to morphology, reproductive fitness, growth, longevity and a variety of parameters assaying general homeostasis. Further detailed analysis of the expression of multiple genes bracketing the deletions revealed only minor expression differences in homozygous deletion and wild-type mice. Together, the two deleted segments harbour 1,243 non-coding sequences conserved between humans and rodents (more than 100 base pairs, 70% identity). Some of the deleted sequences might encode for functions unidentified in our screen; nonetheless, these studies further support the existence of potentially 'disposable DNA' in the genomes of mammals.
Collapse
|
1215
|
Abstract
Rattus norvegicus is an important experimental organism and interesting to evolutionary biologists. The recently published draft rat genome sequence provides us with insights into both the rat's evolution and its physiology. We learn more about genome evolution and, in particular, the adaptive significance of gene family expansions and the evolution of rodent genomes, which appears to have decelerated since the divergence of mouse and rat. An important observation is that some regions of genomes, many in noncoding regions, show very high sequence conservation, while others show unexpectedly fast evolution. Both of these may be pointers to functional significance.
Collapse
|
1216
|
Sabarinadh C, Subramanian S, Tripathi A, Mishra RK. Extreme conservation of noncoding DNA near HoxD complex of vertebrates. BMC Genomics 2004; 5:75. [PMID: 15462684 PMCID: PMC524357 DOI: 10.1186/1471-2164-5-75] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2004] [Accepted: 10/06/2004] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Homeotic gene complexes determine the anterior-posterior body axis in animals. The expression pattern and function of hox genes along this axis is colinear with the order in which they are organized in the complex. This 'chromosomal organization and functional correspondence' is conserved in all bilaterians investigated. Genomic sequences covering the HoxD complex from several vertebrate species are now available. This offers a comparative genomics approach to identify conserved regions linked to this complex. Although the molecular basis of 'colinearity' of Hox complexes is not yet understood, it is possible that there are control elements within or in the proximity of these complexes that establish and maintain the expression patterns of hox genes in a coordinated fashion. RESULTS We have compared DNA sequence flanking the HoxD complex of several primate, rodent and fish species. This analysis revealed an unprecedented conservation of non-coding DNA sequences adjacent to the HoxD complex from fish to human. Stretches of hundreds of base pairs in a 7 kb region, upstream of HoxD complex, show 100% conservation across the vertebrate species. Using PCR primers from the human sequence, these conserved regions could be amplified from other vertebrate species, including other mammals, birds, reptiles, amphibians and fish. Our analysis of these sequences also indicates that starting from the conserved core regions, more sequences have been added on and maintained during evolution from fish to human. CONCLUSION Such a high degree of conservation in the core regions of this 7 kb DNA, where no variation occurred during approximately 500 million years of evolution, suggests critical function for these sequences. We suggest that such sequences are likely to provide molecular handle to gain insight into the evolution and mechanism of regulation of associated gene complexes.
Collapse
Affiliation(s)
- Chilaka Sabarinadh
- Centre for Cellular and Molecular Biology, Uppal Road, Hyderabad 500 007, India
| | - Subbaya Subramanian
- Centre for Cellular and Molecular Biology, Uppal Road, Hyderabad 500 007, India
| | - Anshuman Tripathi
- Centre for Cellular and Molecular Biology, Uppal Road, Hyderabad 500 007, India
| | - Rakesh K Mishra
- Centre for Cellular and Molecular Biology, Uppal Road, Hyderabad 500 007, India
| |
Collapse
|
1217
|
Perry GH, Verrelli BC, Stone AC. Comparative analyses reveal a complex history of molecular evolution for human MYH16. Mol Biol Evol 2004; 22:379-82. [PMID: 15470226 DOI: 10.1093/molbev/msi004] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
We describe the pattern of molecular evolution at a sarcomeric myosin gene, MYH16, using more than 30,000 bp of exon and intron sequence data from the chimpanzee and human genome sequencing projects to evaluate the timing and consequences of a human lineage-specific frameshift deletion. We estimate the age of the deletion at approximately 5.3 MYA. This estimate is consistent with the time of human and chimpanzee divergence and is significantly older than the first appearance of the genus Homo in the fossil record. We also find conflicting estimates of nonsynonymous fixation rates (d(N)) across different regions of this gene, revealing a complex pattern inconsistent with a simple model of pseudogene evolution for human MYH16.
Collapse
|
1218
|
Rodriguez A, Griffiths-Jones S, Ashurst JL, Bradley A. Identification of mammalian microRNA host genes and transcription units. Genome Res 2004; 14:1902-10. [PMID: 15364901 PMCID: PMC524413 DOI: 10.1101/gr.2722704] [Citation(s) in RCA: 1453] [Impact Index Per Article: 69.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2004] [Accepted: 07/27/2004] [Indexed: 12/13/2022]
Abstract
To derive a global perspective on the transcription of microRNAs (miRNAs) in mammals, we annotated the genomic position and context of this class of noncoding RNAs (ncRNAs) in the human and mouse genomes. Of the 232 known mammalian miRNAs, we found that 161 overlap with 123 defined transcription units (TUs). We identified miRNAs within introns of 90 protein-coding genes with a broad spectrum of molecular functions, and in both introns and exons of 66 mRNA-like noncoding RNAs (mlncRNAs). In addition, novel families of miRNAs based on host gene identity were identified. The transcription patterns of all miRNA host genes were curated from a variety of sources illustrating spatial, temporal, and physiological regulation of miRNA expression. These findings strongly suggest that miRNAs are transcribed in parallel with their host transcripts, and that the two different transcription classes of miRNAs ('exonic' and 'intronic') identified here may require slightly different mechanisms of biogenesis.
Collapse
Affiliation(s)
- Antony Rodriguez
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom
| | | | | | | |
Collapse
|
1219
|
Mallon AM, Wilming L, Weekes J, Gilbert JGR, Ashurst J, Peyrefitte S, Matthews L, Cadman M, McKeone R, Sellick CA, Arkell R, Botcherby MRM, Strivens MA, Campbell RD, Gregory S, Denny P, Hancock JM, Rogers J, Brown SDM. Organization and evolution of a gene-rich region of the mouse genome: a 12.7-Mb region deleted in the Del(13)Svea36H mouse. Genome Res 2004; 14:1888-901. [PMID: 15364904 PMCID: PMC524412 DOI: 10.1101/gr.2478604] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Del(13)Svea36H (Del36H) is a deletion of approximately 20% of mouse chromosome 13 showing conserved synteny with human chromosome 6p22.1-6p22.3/6p25. The human region is lost in some deletion syndromes and is the site of several disease loci. Heterozygous Del36H mice show numerous phenotypes and may model aspects of human genetic disease. We describe 12.7 Mb of finished, annotated sequence from Del36H. Del36H has a higher gene density than the draft mouse genome, reflecting high local densities of three gene families (vomeronasal receptors, serpins, and prolactins) which are greatly expanded relative to human. Transposable elements are concentrated near these gene families. We therefore suggest that their neighborhoods are gene factories, regions of frequent recombination in which gene duplication is more frequent. The gene families show different proportions of pseudogenes, likely reflecting different strengths of purifying selection and/or gene conversion. They are also associated with relatively low simple sequence concentrations, which vary across the region with a periodicity of approximately 5 Mb. Del36H contains numerous evolutionarily conserved regions (ECRs). Many lie in noncoding regions, are detectable in species as distant as Ciona intestinalis, and therefore are candidate regulatory sequences. This analysis will facilitate functional genomic analysis of Del36H and provides insights into mouse genome evolution.
Collapse
Affiliation(s)
- Ann-Marie Mallon
- Medical Research Council Mammalian Genetics Unit, Harwell, Oxfordshire, United Kingdom
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
1220
|
Boffelli D, Nobrega MA, Rubin EM. Comparative genomics at the vertebrate extremes. Nat Rev Genet 2004; 5:456-65. [PMID: 15153998 DOI: 10.1038/nrg1350] [Citation(s) in RCA: 190] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Affiliation(s)
- Dario Boffelli
- DOE Joint Genome Institute, Walnut Creek, California 94598, USA
| | | | | |
Collapse
|
1221
|
|
1222
|
Pearson H. 'Junk' DNA reveals vital role. Nature 2004. [DOI: 10.1038/news040503-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|