1
|
Claudio JO, Zhan F, Zhuang L, Khaja R, Zhu YX, Sivananthan K, Trudel S, Masih-Khan E, Fonseca R, Bergsagel PL, Scherer SW, Shaughnessy J, Stewart AK. Expression and mutation status of candidate kinases in multiple myeloma. Leukemia 2007; 21:1124-7. [PMID: 17344920 DOI: 10.1038/sj.leu.2404612] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
2
|
Marques-Bonet T, Sànchez-Ruiz J, Armengol L, Khaja R, Bertranpetit J, Lopez-Bigas N, Rocchi M, Gazave E, Navarro A. On the association between chromosomal rearrangements and genic evolution in humans and chimpanzees. Genome Biol 2007; 8:R230. [PMID: 17971225 PMCID: PMC2246304 DOI: 10.1186/gb-2007-8-10-r230] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2006] [Revised: 10/12/2007] [Accepted: 10/30/2007] [Indexed: 02/04/2023] Open
Abstract
BACKGROUND The role that chromosomal rearrangements might have played in the speciation processes that have separated the lineages of humans and chimpanzees has recently come into the spotlight. To date, however, results are contradictory. Here we revisit this issue by making use of the available human and chimpanzee genome sequence to study the relationship between chromosomal rearrangements and rates of DNA sequence evolution. RESULTS Contrary to previous findings for this pair of species, we show that genes located in the rearranged chromosomes that differentiate the genomes of humans and chimpanzees, especially genes within rearrangements themselves, present lower divergence than genes elsewhere in the genome. Still, there are considerable differences between individual chromosomes. Chromosome 4, in particular, presents higher divergence in genes located within its rearrangement. CONCLUSION A first conclusion of our analysis is that divergence is lower for genes located in rearranged chromosomes than for those in colinear chromosomes. We also report that non-coding regions within rearranged regions tend to have lower divergence than non-coding regions outside them. These results suggest an association between chromosomal rearrangements and lower non-coding divergence that has not been reported before, even if some chromosomes do not follow this trend and could be potentially associated with a speciation episode. In summary, without excluding it, our results suggest that chromosomal speciation has not been common along the human and chimpanzee lineage.
Collapse
Affiliation(s)
- Tomàs Marques-Bonet
- Unitat de Biologia Evolutiva Departament de Ciències Experimentals i de la Salut, Departament de Ciències Experimentals i de la Salut. Universitat Pompeu Fabra. Parc de Recerca Biomèdica de Barcelona. Dr. Aiguader 88. 08003 Barcelona. Catalonia, Spain
| | - Jesús Sànchez-Ruiz
- Unitat de Biologia Evolutiva Departament de Ciències Experimentals i de la Salut, Departament de Ciències Experimentals i de la Salut. Universitat Pompeu Fabra. Parc de Recerca Biomèdica de Barcelona. Dr. Aiguader 88. 08003 Barcelona. Catalonia, Spain
| | - Lluís Armengol
- Genes and Disease Program, Center for Genomic Regulation,. Parc de Recerca Biomèdica de Barcelona. Dr. Aiguader 88, 1. 08003 Barcelona. Catalonia, Spain
- CIBER Epidemiología y Salud Pública (CIBERESP), Spain
| | - Razi Khaja
- The Center for Applied Genomics. The Hospital for Sick Children. MaRS Centre - East Tower. 101 College Street, Room 14-706. Toronto, Ontario. Canada
| | - Jaume Bertranpetit
- Unitat de Biologia Evolutiva Departament de Ciències Experimentals i de la Salut, Departament de Ciències Experimentals i de la Salut. Universitat Pompeu Fabra. Parc de Recerca Biomèdica de Barcelona. Dr. Aiguader 88. 08003 Barcelona. Catalonia, Spain
- CIBER Epidemiología y Salud Pública (CIBERESP), Spain
| | - Núria Lopez-Bigas
- Research Unit on Biomedical Informatics of IMIM/UPF. Parc de Recerca Biomèdica de Barcelona. Dr. Aiguader 88. 08003 Barcelona. Catalonia, Spain
| | - Mariano Rocchi
- Dipartimento di Genetica e Microbiologia. Universita di Bari, Bari, Italy
| | - Elodie Gazave
- Unitat de Biologia Evolutiva Departament de Ciències Experimentals i de la Salut, Departament de Ciències Experimentals i de la Salut. Universitat Pompeu Fabra. Parc de Recerca Biomèdica de Barcelona. Dr. Aiguader 88. 08003 Barcelona. Catalonia, Spain
| | - Arcadi Navarro
- Unitat de Biologia Evolutiva Departament de Ciències Experimentals i de la Salut, Departament de Ciències Experimentals i de la Salut. Universitat Pompeu Fabra. Parc de Recerca Biomèdica de Barcelona. Dr. Aiguader 88. 08003 Barcelona. Catalonia, Spain
- Institucio Catalana de Recerca i Estudis Avancats (ICREA) and Unitat de Biologia Evolutiva, Departament de Ciències Experimentals i de la Salut, Universitat Pompeu Fabra. Parc de Recerca Biomèdica de Barcelona. Plaça Dr. Aiguader 88. 08003 Barcelona. Catalonia, Spain
- CIBER Epidemiología y Salud Pública (CIBERESP), Spain
- Population Genomics Node (GNV8) National Institute for Bioinformatics (INB), Spain
| |
Collapse
|
3
|
Zhang J, Feuk L, Duggan GE, Khaja R, Scherer SW. Development of bioinformatics resources for display and analysis of copy number and other structural variants in the human genome. Cytogenet Genome Res 2006; 115:205-14. [PMID: 17124402 DOI: 10.1159/000095916] [Citation(s) in RCA: 172] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2006] [Accepted: 05/15/2006] [Indexed: 11/19/2022] Open
Abstract
The discovery of an abundance of copy number variants (CNVs; gains and losses of DNA sequences >1 kb) and other structural variants in the human genome is influencing the way research and diagnostic analyses are being designed and interpreted. As such, comprehensive databases with the most relevant information will be critical to fully understand the results and have impact in a diverse range of disciplines ranging from molecular biology to clinical genetics. Here, we describe the development of bioinformatics resources to facilitate these studies. The Database of Genomic Variants (http://projects.tcag.ca/variation/) is a comprehensive catalogue of structural variation in the human genome. The database currently contains 1,267 regions reported to contain copy number variation or inversions in apparently healthy human cases. We describe the current contents of the database and how it can serve as a resource for interpretation of array comparative genomic hybridization (array CGH) and other DNA copy imbalance data. We also present the structure of the database, which was built using a new data modeling methodology termed Cross-Referenced Tables (XRT). This is a generic and easy-to-use platform, which is strong in handling textual data and complex relationships. Web-based presentation tools have been built allowing publication of XRT data to the web immediately along with rapid sharing of files with other databases and genome browsers. We also describe a novel tool named eFISH (electronic fluorescence in situ hybridization) (http://projects.tcag.ca/efish/), a BLAST-based program that was developed to facilitate the choice of appropriate clones for FISH and CGH experiments, as well as interpretation of results in which genomic DNA probes are used in hybridization-based experiments.
Collapse
Affiliation(s)
- J Zhang
- The Centre for Applied Genomics and the Program in Genetics and Genomic Biology, The Hospital for Sick Children, Toronto, Ontario, Canada
| | | | | | | | | |
Collapse
|
4
|
Khaja R, Zhang J, MacDonald JR, He Y, Joseph-George AM, Wei J, Rafiq MA, Qian C, Shago M, Pantano L, Aburatani H, Jones K, Redon R, Hurles M, Armengol L, Estivill X, Mural RJ, Lee C, Scherer SW, Feuk L. Genome assembly comparison identifies structural variants in the human genome. Nat Genet 2006; 38:1413-8. [PMID: 17115057 PMCID: PMC2674632 DOI: 10.1038/ng1921] [Citation(s) in RCA: 134] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2006] [Accepted: 10/13/2006] [Indexed: 12/19/2022]
Abstract
Numerous types of DNA variation exist, ranging from SNPs to larger structural alterations such as copy number variants (CNVs) and inversions. Alignment of DNA sequence from different sources has been used to identify SNPs and intermediate-sized variants (ISVs). However, only a small proportion of total heterogeneity is characterized, and little is known of the characteristics of most smaller-sized (<50 kb) variants. Here we show that genome assembly comparison is a robust approach for identification of all classes of genetic variation. Through comparison of two human assemblies (Celera's R27c compilation and the Build 35 reference sequence), we identified megabases of sequence (in the form of 13,534 putative non-SNP events) that were absent, inverted or polymorphic in one assembly. Database comparison and laboratory experimentation further demonstrated overlap or validation for 240 variable regions and confirmed >1.5 million SNPs. Some differences were simple insertions and deletions, but in regions containing CNVs, segmental duplication and repetitive DNA, they were more complex. Our results uncover substantial undescribed variation in humans, highlighting the need for comprehensive annotation strategies to fully interpret genome scanning and personalized sequencing projects.
Collapse
Affiliation(s)
- Razi Khaja
- Program in Genetics and Genomic Biology, The Hospital for Sick Children and Department of Molecular and Medical Genetics, University of Toronto and The Centre for Applied Genomics, MaRS Centre, Toronto, Ontario, M5G 1L7, Canada
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
5
|
Khaja R, MacDonald JR, Zhang J, Scherer SW. Methods for identifying and mapping recent segmental and gene duplications in eukaryotic genomes. Methods Mol Biol 2006; 338:9-20. [PMID: 16888347 DOI: 10.1385/1-59745-097-9:9] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
The aim of this chapter is to provide instruction for analyzing and mapping recent segmental and gene duplications in eukaryotic genomes. We describe a bioinformatics-based approach utilizing computational tools to manage eukaryotic genome sequences to characterize and understand the evolutionary fates and trajectories of duplicated genes. An introduction to bioinformatics tools and programs such as BLAST, Perl, BioPerl, and the GFF specification provides the necessary background to complete this analysis for any eukaryotic genome of interest.
Collapse
Affiliation(s)
- Razi Khaja
- Program in Genetics and Genomic Biology, Research Institute, The Hospital for Sick Children, Toronto, ON, Canada
| | | | | | | |
Collapse
|
6
|
Armengol L, Marquès-Bonet T, Cheung J, Khaja R, González JR, Scherer SW, Navarro A, Estivill X. Murine segmental duplications are hot spots for chromosome and gene evolution. Genomics 2005; 86:692-700. [PMID: 16256303 DOI: 10.1016/j.ygeno.2005.08.008] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2005] [Revised: 08/06/2005] [Accepted: 08/23/2005] [Indexed: 11/27/2022]
Abstract
Mouse and rat genomic sequences permit us to obtain a global view of evolutionary rearrangements that have occurred between the two species and to define hallmarks that might underlie these events. We present a comparative study of the sequence assemblies of mouse and rat genomes and report an enrichment of rodent-specific segmental duplications in regions where synteny is not preserved. We show that segmental duplications present higher rates of molecular evolution and that genes in rearranged regions have evolved faster than those located elsewhere. Previous studies have shown that synteny breakpoints between the mouse and the human genomes are enriched in human segmental duplications, suggesting a causative connection between such structures and evolutionary rearrangements. Our work provides further evidence to support the role of segmental duplications in chromosomal rearrangements in the evolution of the architecture of mammalian chromosomes and in the speciation processes that separate the mouse and the rat.
Collapse
Affiliation(s)
- Lluís Armengol
- Genes and Disease Program, Center for Genomic Regulation, Passeig Marítim 37-49, 08003 Barcelona, Catalonia, Spain
| | | | | | | | | | | | | | | |
Collapse
|
7
|
Feuk L, MacDonald JR, Tang T, Carson AR, Li M, Rao G, Khaja R, Scherer SW. Discovery of human inversion polymorphisms by comparative analysis of human and chimpanzee DNA sequence assemblies. PLoS Genet 2005; 1:e56. [PMID: 16254605 PMCID: PMC1270012 DOI: 10.1371/journal.pgen.0010056] [Citation(s) in RCA: 142] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2005] [Accepted: 09/29/2005] [Indexed: 02/05/2023] Open
Abstract
With a draft genome-sequence assembly for the chimpanzee available, it is now possible to perform genome-wide analyses to identify, at a submicroscopic level, structural rearrangements that have occurred between chimpanzees and humans. The goal of this study was to investigate chromosomal regions that are inverted between the chimpanzee and human genomes. Using the net alignments for the builds of the human and chimpanzee genome assemblies, we identified a total of 1,576 putative regions of inverted orientation, covering more than 154 mega-bases of DNA. The DNA segments are distributed throughout the genome and range from 23 base pairs to 62 mega-bases in length. For the 66 inversions more than 25 kilobases (kb) in length, 75% were flanked on one or both sides by (often unrelated) segmental duplications. Using PCR and fluorescence in situ hybridization we experimentally validated 23 of 27 (85%) semi-randomly chosen regions; the largest novel inversion confirmed was 4.3 mega-bases at human Chromosome 7p14. Gorilla was used as an out-group to assign ancestral status to the variants. All experimentally validated inversion regions were then assayed against a panel of human samples and three of the 23 (13%) regions were found to be polymorphic in the human genome. These polymorphic inversions include 730 kb (at 7p22), 13 kb (at 7q11), and 1 kb (at 16q24) fragments with a 5%, 30%, and 48% minor allele frequency, respectively. Our results suggest that inversions are an important source of variation in primate genome evolution. The finding of at least three novel inversion polymorphisms in humans indicates this type of structural variation may be a more common feature of our genome than previously realized. Chimpanzee is the closest relative to humans having DNA sequences about 98% identical to each other. Small DNA sequence changes and probably more importantly, larger structural changes of chromosomes, led to the divergence of the two species some 6 million years ago. Until recently, there were ten structural differences visible under the microscope between chimpanzee and human, and nine of these were inversions of DNA. Through computational comparisons of genome sequences, the current study identifies another 1,576 putative inversion events. Thirty-three of these were larger than 100,000 base pairs in size and 29 intersect genes, prioritizing them for evolutionary studies. Twenty-three of the inversions have been confirmed experimentally with the largest being 4.3 million base pairs in size on human Chromosome 7. Surprisingly, three of the “inverted” regions were found to be variable in their orientation in the human population (in some cases the inversion was in the ancestral orientation found in chimpanzee). These observations indicate the human genome is still evolving in structure. Moreover, since such variable inversions have been shown to predispose to other (sometimes deleterious) changes in chromosomes, the new data delineate potential disease-associated genes.
Collapse
Affiliation(s)
- Lars Feuk
- The Centre for Applied Genomics, Department of Genetics and Genomic Biology, The Hospital for Sick Children, Toronto, Ontario, Canada
- Department of Molecular and Medical Genetics, University of Toronto, Ontario, Canada
| | - Jeffrey R MacDonald
- The Centre for Applied Genomics, Department of Genetics and Genomic Biology, The Hospital for Sick Children, Toronto, Ontario, Canada
| | - Terence Tang
- The Centre for Applied Genomics, Department of Genetics and Genomic Biology, The Hospital for Sick Children, Toronto, Ontario, Canada
| | - Andrew R Carson
- The Centre for Applied Genomics, Department of Genetics and Genomic Biology, The Hospital for Sick Children, Toronto, Ontario, Canada
- Department of Molecular and Medical Genetics, University of Toronto, Ontario, Canada
| | - Martin Li
- The Centre for Applied Genomics, Department of Genetics and Genomic Biology, The Hospital for Sick Children, Toronto, Ontario, Canada
| | - Girish Rao
- The Centre for Applied Genomics, Department of Genetics and Genomic Biology, The Hospital for Sick Children, Toronto, Ontario, Canada
| | - Razi Khaja
- The Centre for Applied Genomics, Department of Genetics and Genomic Biology, The Hospital for Sick Children, Toronto, Ontario, Canada
| | - Stephen W Scherer
- The Centre for Applied Genomics, Department of Genetics and Genomic Biology, The Hospital for Sick Children, Toronto, Ontario, Canada
- Department of Molecular and Medical Genetics, University of Toronto, Ontario, Canada
- * To whom correspondence should be addressed. E-mail:
| |
Collapse
|
8
|
Scherer SW, Cheung J, MacDonald JR, Osborne LR, Nakabayashi K, Herbrick JA, Carson AR, Parker-Katiraee L, Skaug J, Khaja R, Zhang J, Hudek AK, Li M, Haddad M, Duggan GE, Fernandez BA, Kanematsu E, Gentles S, Christopoulos CC, Choufani S, Kwasnicka D, Zheng XH, Lai Z, Nusskern D, Zhang Q, Gu Z, Lu F, Zeesman S, Nowaczyk MJ, Teshima I, Chitayat D, Shuman C, Weksberg R, Zackai EH, Grebe TA, Cox SR, Kirkpatrick SJ, Rahman N, Friedman JM, Heng HHQ, Pelicci PG, Lo-Coco F, Belloni E, Shaffer LG, Pober B, Morton CC, Gusella JF, Bruns GAP, Korf BR, Quade BJ, Ligon AH, Ferguson H, Higgins AW, Leach NT, Herrick SR, Lemyre E, Farra CG, Kim HG, Summers AM, Gripp KW, Roberts W, Szatmari P, Winsor EJT, Grzeschik KH, Teebi A, Minassian BA, Kere J, Armengol L, Pujana MA, Estivill X, Wilson MD, Koop BF, Tosi S, Moore GE, Boright AP, Zlotorynski E, Kerem B, Kroisel PM, Petek E, Oscier DG, Mould SJ, Döhner H, Döhner K, Rommens JM, Vincent JB, Venter JC, Li PW, Mural RJ, Adams MD, Tsui LC. Human chromosome 7: DNA sequence and biology. Science 2003; 300:767-72. [PMID: 12690205 PMCID: PMC2882961 DOI: 10.1126/science.1083423] [Citation(s) in RCA: 156] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
DNA sequence and annotation of the entire human chromosome 7, encompassing nearly 158 million nucleotides of DNA and 1917 gene structures, are presented. To generate a higher order description, additional structural features such as imprinted genes, fragile sites, and segmental duplications were integrated at the level of the DNA sequence with medical genetic data, including 440 chromosome rearrangement breakpoints associated with disease. This approach enabled the discovery of candidate genes for developmental diseases including autism.
Collapse
Affiliation(s)
- Stephen W Scherer
- Department of Genetics and Genomic Biology, The Hospital for Sick Children, Toronto, Ontario, Canada, M5G 1X8.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
9
|
Cheung J, Estivill X, Khaja R, MacDonald JR, Lau K, Tsui LC, Scherer SW. Genome-wide detection of segmental duplications and potential assembly errors in the human genome sequence. Genome Biol 2003; 4:R25. [PMID: 12702206 PMCID: PMC154576 DOI: 10.1186/gb-2003-4-4-r25] [Citation(s) in RCA: 173] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2002] [Revised: 01/22/2003] [Accepted: 02/21/2003] [Indexed: 11/24/2022] Open
Abstract
BACKGROUND Previous studies have suggested that recent segmental duplications, which are often involved in chromosome rearrangements underlying genomic disease, account for some 5% of the human genome. We have developed rapid computational heuristics based on BLAST analysis to detect segmental duplications, as well as regions containing potential sequence misassignments in the human genome assemblies. RESULTS Our analysis of the June 2002 public human genome assembly revealed that 107.4 of 3,043.1 megabases (Mb) (3.53%) of sequence contained segmental duplications, each with size equal or more than 5 kb and 90% identity. We have also detected that 38.9 Mb (1.28%) of sequence within this assembly is likely to be involved in sequence misassignment errors. Furthermore, we have identified a significant subset (199,965 of 2,327,473 or 8.6%) of single-nucleotide polymorphisms (SNPs) in the public databases that are not true SNPs but are potential paralogous sequence variants. CONCLUSION Using two distinct computational approaches, we have identified most of the sequences in the human genome that have undergone recent segmental duplications. Near-identical segmental duplications present a major challenge to the completion of the human genome sequence. Potential sequence misassignments detected in this study would require additional efforts to resolve.
Collapse
Affiliation(s)
- Joseph Cheung
- Program in Genetics and Genomic Biology, Research Institute, The Hospital for Sick Children, Toronto, Canada
| | - Xavier Estivill
- Program in Genetics and Genomic Biology, Research Institute, The Hospital for Sick Children, Toronto, Canada
- Genes and Disease Program, Genomic Regulation Center, and Facultat Ciencies de la Salut i de la Vida, Universitat Pompeu Fabra, E-08003 Barcelona, Catalonia, Spain
| | - Razi Khaja
- Program in Genetics and Genomic Biology, Research Institute, The Hospital for Sick Children, Toronto, Canada
| | - Jeffrey R MacDonald
- Program in Genetics and Genomic Biology, Research Institute, The Hospital for Sick Children, Toronto, Canada
| | - Ken Lau
- Program in Genetics and Genomic Biology, Research Institute, The Hospital for Sick Children, Toronto, Canada
| | - Lap-Chee Tsui
- Program in Genetics and Genomic Biology, Research Institute, The Hospital for Sick Children, Toronto, Canada
- Department of Molecular and Medical Genetics, University of Toronto, 555 University Avenue, Toronto, ON M5G 1X8, Canada
- Current address: The University of Hong Kong, Pokfulam Road, Hong Kong
| | - Stephen W Scherer
- Program in Genetics and Genomic Biology, Research Institute, The Hospital for Sick Children, Toronto, Canada
- Department of Molecular and Medical Genetics, University of Toronto, 555 University Avenue, Toronto, ON M5G 1X8, Canada
| |
Collapse
|
10
|
Cheung J, Wilson MD, Zhang J, Khaja R, MacDonald JR, Heng HHQ, Koop BF, Scherer SW. Recent segmental and gene duplications in the mouse genome. Genome Biol 2003; 4:R47. [PMID: 12914656 PMCID: PMC193640 DOI: 10.1186/gb-2003-4-8-r47] [Citation(s) in RCA: 66] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2003] [Revised: 05/22/2003] [Accepted: 06/17/2003] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The high quality of the mouse genome draft sequence and its associated annotations are an invaluable biological resource. Identifying recent duplications in the mouse genome, especially in regions containing genes, may highlight important events in recent murine evolution. In addition, detecting recent sequence duplications can reveal potentially problematic regions of the genome assembly. We use BLAST-based computational heuristics to identify large (>/= 5 kb) and recent (>/= 90% sequence identity) segmental duplications in the mouse genome sequence. Here we present a database of recently duplicated regions of the mouse genome found in the mouse genome sequencing consortium (MGSC) February 2002 and February 2003 assemblies. RESULTS We determined that 33.6 Mb of 2,695 Mb (1.2%) of sequence from the February 2003 mouse genome sequence assembly is involved in recent segmental duplications, which is less than that observed in the human genome (around 3.5-5%). From this dataset, 8.9 Mb (26%) of the duplication content consisted of 'unmapped' chromosome sequence. Moreover, we suspect that an additional 18.5 Mb of sequence is involved in duplication artifacts arising from sequence misassignment errors in this genome assembly. By searching for genes that are located within these regions, we identified 675 genes that mapped to duplicated regions of the mouse genome. Sixteen of these genes appear to have been duplicated independently in the human genome. From our dataset we further characterized a 42 kb recent segmental duplication of Mater, a maternal-effect gene essential for embryogenesis in mice. CONCLUSION Our results provide an initial analysis of the recently duplicated sequence and gene content of the mouse genome. Many of these duplicated loci, as well as regions identified to be involved in potential sequence misassignment errors, will require further mapping and sequencing to achieve accuracy. A Genome Browser database was set up to display the identified duplication content presented in this work. This data will also be relevant to the growing number of investigators who use the draft genome sequence for experimental design and analysis.
Collapse
Affiliation(s)
- Joseph Cheung
- Program in Genetics and Genomic Biology, Research Institute, The Hospital for Sick Children, Toronto, ON M5G 1X8, Canada.
| | | | | | | | | | | | | | | |
Collapse
|