1
|
Soto DC, Uribe-Salazar JM, Shew CJ, Sekar A, McGinty SP, Dennis MY. Genomic structural variation: A complex but important driver of human evolution. AMERICAN JOURNAL OF BIOLOGICAL ANTHROPOLOGY 2023. [PMID: 36794631 DOI: 10.1002/ajpa.24713] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/02/2022] [Revised: 01/21/2023] [Accepted: 02/05/2023] [Indexed: 02/17/2023]
Abstract
Structural variants (SVs)-including duplications, deletions, and inversions of DNA-can have significant genomic and functional impacts but are technically difficult to identify and assay compared with single-nucleotide variants. With the aid of new genomic technologies, it has become clear that SVs account for significant differences across and within species. This phenomenon is particularly well-documented for humans and other primates due to the wealth of sequence data available. In great apes, SVs affect a larger number of nucleotides than single-nucleotide variants, with many identified SVs exhibiting population and species specificity. In this review, we highlight the importance of SVs in human evolution by (1) how they have shaped great ape genomes resulting in sensitized regions associated with traits and diseases, (2) their impact on gene functions and regulation, which subsequently has played a role in natural selection, and (3) the role of gene duplications in human brain evolution. We further discuss how to incorporate SVs in research, including the strengths and limitations of various genomic approaches. Finally, we propose future considerations in integrating existing data and biospecimens with the ever-expanding SV compendium propelled by biotechnology advancements.
Collapse
Affiliation(s)
- Daniela C Soto
- Genome Center, MIND Institute, Department of Biochemstry & Molecular Medicine, University of California, Davis, California, USA.,Integrative Genetics and Genomics Graduate Group, University of California, Davis, California, USA
| | - José M Uribe-Salazar
- Genome Center, MIND Institute, Department of Biochemstry & Molecular Medicine, University of California, Davis, California, USA.,Integrative Genetics and Genomics Graduate Group, University of California, Davis, California, USA
| | - Colin J Shew
- Genome Center, MIND Institute, Department of Biochemstry & Molecular Medicine, University of California, Davis, California, USA.,Integrative Genetics and Genomics Graduate Group, University of California, Davis, California, USA
| | - Aarthi Sekar
- Genome Center, MIND Institute, Department of Biochemstry & Molecular Medicine, University of California, Davis, California, USA.,Integrative Genetics and Genomics Graduate Group, University of California, Davis, California, USA
| | - Sean P McGinty
- Genome Center, MIND Institute, Department of Biochemstry & Molecular Medicine, University of California, Davis, California, USA.,Integrative Genetics and Genomics Graduate Group, University of California, Davis, California, USA
| | - Megan Y Dennis
- Genome Center, MIND Institute, Department of Biochemstry & Molecular Medicine, University of California, Davis, California, USA.,Integrative Genetics and Genomics Graduate Group, University of California, Davis, California, USA
| |
Collapse
|
2
|
Usui H, Nakabayashi K, Maehara K, Hata K, Shozu M. Genome-wide single nucleotide polymorphism array analysis unveils the origin of heterozygous androgenetic complete moles. Sci Rep 2019; 9:12542. [PMID: 31467376 PMCID: PMC6715694 DOI: 10.1038/s41598-019-49047-7] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2019] [Accepted: 08/19/2019] [Indexed: 11/17/2022] Open
Abstract
Hydatidiform moles are abnormal pregnancies, which show trophoblastic hyperplasia. Most often, the nuclear genome in complete hydatidiform moles (CHMs) is composed of only paternal chromosomes. Diploid androgenetic conceptuses can be divided into homozygous and heterozygous CHMs. Heterozygous CHMs originate from two sperms or a diploid sperm, the distinction of which has not been established. Here, we assessed the origin of heterozygous CHMs using single nucleotide polymorphism (SNP) array. Thirteen heterozygous CHMs were analysed using B allele frequency (BAF) plotting to determine the centromeric zygosity status of all chromosomes. One case was from the duplication of a single sperm with an XY chromosome. In the other twelve cases, centromeric zygosity was random, i.e. mixed status. Thus, the twelve heterozygous CHMs were considered to be of dispermic origin but not diploid sperm origin. BAF plotting of SNP array can be a powerful tool to estimate the type of hydatidiform moles.
Collapse
Affiliation(s)
- Hirokazu Usui
- Department of Reproductive Medicine, Graduate School of Medicine, Chiba University, Chiba, Chiba, 260-8670, Japan.
| | - Kazuhiko Nakabayashi
- Department of Maternal-Fetal Biology, National Research Institute for Child Health and Development, Setagaya, Tokyo, 157-8535, Japan
| | - Kayoko Maehara
- Department of Maternal-Fetal Biology, National Research Institute for Child Health and Development, Setagaya, Tokyo, 157-8535, Japan.,Department of Nutrition, Graduate School of Health Sciences, Kio University, Kitakatsuragi, Nara, 635-0832, Japan
| | - Kenichiro Hata
- Department of Maternal-Fetal Biology, National Research Institute for Child Health and Development, Setagaya, Tokyo, 157-8535, Japan
| | - Makio Shozu
- Department of Reproductive Medicine, Graduate School of Medicine, Chiba University, Chiba, Chiba, 260-8670, Japan
| |
Collapse
|
3
|
Yamamoto E, Niimi K, Kiyono T, Yamamoto T, Nishino K, Nakamura K, Kotani T, Kajiyama H, Shibata K, Kikkawa F. Establishment and characterization of cell lines derived from complete hydatidiform mole. Int J Mol Med 2017; 40:614-622. [PMID: 28713902 PMCID: PMC5547987 DOI: 10.3892/ijmm.2017.3067] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2017] [Accepted: 07/05/2017] [Indexed: 12/04/2022] Open
Abstract
Gestational trophoblastic diseases (GTDs) are a group of diseases characterized by abnormal cellular proliferation of atypical trophoblasts. A hydatidiform mole is an abnormal pregnancy caused by genetic fertilization disorders, and it can be classified as a complete hydatidiform mole (CHM) or a partial hydatidiform mole. The aim of this study was to establish cell lines from CHMs and to characterize the cells for future studies concerning GTD. HMol1-2C, HMol1-3B, HMol1-8 and HMol3-1B were established from primary cultures of CHM explants following the introduction of different combinations of genes including human telomerase reverse transcriptase (hTERT), a mutant form of CDK (CDK4R24C), cyclin D1, p53C234, MYC and HRAS. HMol1-2C, HMol1-3B, and HMol3-1B were confirmed to originate from trophoblasts of androgenic, homozygous CHMs. These three cell lines exhibited low human chorionic gonadotropin secretion, low migration and invasion abilities, and the potential to differentiate into syncytiotrophoblastic cells via forskolin treatment. These results suggest that these cells exhibit characteristics of trophoblastic cells, especially cytotrophoblastic cells. HMol1-8 was found to consist of diploid cells and originated from maternal cells, suggesting that they were derived from decidual cells. In conclusion, we successfully established three cell lines from CHMs by introduction of hTERT and other genes. Analysis revealed that the genetic origin of each cell line was identical with that of the original molar tissue, and the cell lines exhibited characteristics of trophoblastic cells, which are similar to undifferentiated cytotrophoblasts.
Collapse
Affiliation(s)
- Eiko Yamamoto
- Department of Obstetrics and Gynecology, Nagoya University Graduate School of Medicine, Nagoya 466-8550, Japan
| | - Kaoru Niimi
- Department of Obstetrics and Gynecology, Nagoya University Graduate School of Medicine, Nagoya 466-8550, Japan
| | - Tohru Kiyono
- Division of Carcinogenesis and Cancer Prevention, National Cancer Center Research Institute, Tokyo 104-0045, Japan
| | - Toshimichi Yamamoto
- Department of Legal Medicine and Bioethics, Nagoya University Graduate School of Medicine, Nagoya 466-8550, Japan
| | - Kimihiro Nishino
- Department of Obstetrics and Gynecology, Nagoya University Graduate School of Medicine, Nagoya 466-8550, Japan
| | - Kenichi Nakamura
- Department of Obstetrics and Gynecology, Nagoya University Graduate School of Medicine, Nagoya 466-8550, Japan
| | - Tomomi Kotani
- Department of Obstetrics and Gynecology, Nagoya University Graduate School of Medicine, Nagoya 466-8550, Japan
| | - Hiroaki Kajiyama
- Department of Obstetrics and Gynecology, Nagoya University Graduate School of Medicine, Nagoya 466-8550, Japan
| | - Kiyosumi Shibata
- Department of Obstetrics and Gynecology, Nagoya University Graduate School of Medicine, Nagoya 466-8550, Japan
| | - Fumitaka Kikkawa
- Department of Obstetrics and Gynecology, Nagoya University Graduate School of Medicine, Nagoya 466-8550, Japan
| |
Collapse
|
4
|
Steinberg KM, Schneider VA, Graves-Lindsay TA, Fulton RS, Agarwala R, Huddleston J, Shiryev SA, Morgulis A, Surti U, Warren WC, Church DM, Eichler EE, Wilson RK. Single haplotype assembly of the human genome from a hydatidiform mole. Genome Res 2014; 24:2066-76. [PMID: 25373144 PMCID: PMC4248323 DOI: 10.1101/gr.180893.114] [Citation(s) in RCA: 92] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
A complete reference assembly is essential for accurately interpreting individual genomes and associating variation with phenotypes. While the current human reference genome sequence is of very high quality, gaps and misassemblies remain due to biological and technical complexities. Large repetitive sequences and complex allelic diversity are the two main drivers of assembly error. Although increasing the length of sequence reads and library fragments can improve assembly, even the longest available reads do not resolve all regions. In order to overcome the issue of allelic diversity, we used genomic DNA from an essentially haploid hydatidiform mole, CHM1. We utilized several resources from this DNA including a set of end-sequenced and indexed BAC clones and 100× Illumina whole-genome shotgun (WGS) sequence coverage. We used the WGS sequence and the GRCh37 reference assembly to create an assembly of the CHM1 genome. We subsequently incorporated 382 finished BAC clone sequences to generate a draft assembly, CHM1_1.1 (NCBI AssemblyDB GCA_000306695.2). Analysis of gene, repetitive element, and segmental duplication content show this assembly to be of excellent quality and contiguity. However, comparison to assembly-independent resources, such as BAC clone end sequences and PacBio long reads, indicate misassembled regions. Most of these regions are enriched for structural variation and segmental duplication, and can be resolved in the future. This publicly available assembly will be integrated into the Genome Reference Consortium curation framework for further improvement, with the ultimate goal being a completely finished gap-free assembly.
Collapse
Affiliation(s)
| | - Valerie A Schneider
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| | | | - Robert S Fulton
- The Genome Institute at Washington University, St. Louis, Missouri 63108, USA
| | - Richa Agarwala
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - John Huddleston
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA; Howard Hughes Medical Institute, University of Washington, Seattle, Washington 98195, USA
| | - Sergey A Shiryev
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - Aleksandr Morgulis
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - Urvashi Surti
- Department of Pathology and Human Genetics, University of Pittsburgh, Pittsburgh, Pennsylvania 15260, USA
| | - Wesley C Warren
- The Genome Institute at Washington University, St. Louis, Missouri 63108, USA
| | | | - Evan E Eichler
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA; Howard Hughes Medical Institute, University of Washington, Seattle, Washington 98195, USA
| | - Richard K Wilson
- The Genome Institute at Washington University, St. Louis, Missouri 63108, USA
| |
Collapse
|
5
|
Higasa K, Kukita Y, Kato K, Wake N, Tahira T, Hayashi K. Evaluation of haplotype inference using definitive haplotype data obtained from complete hydatidiform moles, and its significance for the analyses of positively selected regions. PLoS Genet 2009; 5:e1000468. [PMID: 19424418 PMCID: PMC2670534 DOI: 10.1371/journal.pgen.1000468] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2008] [Accepted: 04/06/2009] [Indexed: 11/18/2022] Open
Abstract
The haplotype map constructed by the HapMap Project is a valuable resource in the genetic studies of disease genes, population structure, and evolution. In the Project, Caucasian and African haplotypes are fairly accurately inferred, based mainly on the rules of Mendelian inheritance using the genotypes of trios. However, the Asian haplotypes are inferred from the genotypes of unrelated individuals based on population genetics, and are less accurate. Thus, the effects of this inaccuracy on downstream analyses needs to be assessed. We determined true Japanese haplotypes by genotyping 100 complete hydatidiform moles (CHM), each carrying a genome derived from a single sperm, using Affymetrix 500 K Arrays. We then assessed how inferred haplotypes can differ from true haplotypes, by phasing pseudo-individualized true haplotypes using the programs PHASE, fastPHASE, and Beagle. We found that, at various genomic regions, especially the MHC locus, the expansion of extended haplotype homozygosity (EHH), which is a measure of positive selection, is obscured when inferred Asian haplotype data is used to detect the expansion. We then mapped the genome using a new statistic, XDiHH, which directly detects the difference between the true and inferred haplotypes, in the determination of EHH expansion. We also show that the true haplotype data presented here is useful to assess and improve the accuracy of phasing of Asian genotypes.
Collapse
Affiliation(s)
- Koichiro Higasa
- Division of Genome Analysis, Research Center for Genetic Information, Medical Institute of Bioregulation, Kyushu University, Fukuoka, Japan
| | - Yoji Kukita
- Division of Genome Analysis, Research Center for Genetic Information, Medical Institute of Bioregulation, Kyushu University, Fukuoka, Japan
| | - Kiyoko Kato
- Division of Molecular and Cell Therapeutics, Medical Institute of Bioregulation, Kyushu University, Fukuoka, Japan
| | - Norio Wake
- Division of Molecular and Cell Therapeutics, Medical Institute of Bioregulation, Kyushu University, Fukuoka, Japan
| | - Tomoko Tahira
- Division of Genome Analysis, Research Center for Genetic Information, Medical Institute of Bioregulation, Kyushu University, Fukuoka, Japan
| | - Kenshi Hayashi
- Division of Genome Analysis, Research Center for Genetic Information, Medical Institute of Bioregulation, Kyushu University, Fukuoka, Japan
- * E-mail:
| |
Collapse
|
6
|
Higasa K, Miyatake K, Kukita Y, Tahira T, Hayashi K. D-HaploDB: a database of definitive haplotypes determined by genotyping complete hydatidiform mole samples. Nucleic Acids Res 2006; 35:D685-9. [PMID: 17166862 PMCID: PMC1781173 DOI: 10.1093/nar/gkl848] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The Definitive Haplotype Database (D-HaploDB) is a web-accessible resource of genome-wide definitive haplotypes determined from a collection of Japanese complete hydatidiform moles (CHMs), each of which carries a genome derived from a single sperm. Currently, the database contains genotypes for 281 439 common SNPs from 74 CHMs which were determined by a high-throughput array-based oligonucleotide hybridization technique. The database also presents maps of haplotype blocks and linkage disequilibrium bins together with tagSNPs that might prove useful for association studies of disease genes. Cryptic relatedness among the samples in this study is unlikely, because the formation of a CHM is a maternal event of rare sporadic occurrence, and its genotype is that of the incoming sperm. This is demonstrated by the absence of long extended shared haplotypes (ESHs). The D-HaploDB is freely accessible via the Internet at http://orca.gen.kyushu-u.ac.jp.
Collapse
Affiliation(s)
| | | | | | | | - Kenshi Hayashi
- To whom correspondence should be addressed. Tel: +81 92 642 6171; Fax: +81 92 632 2375; E-mail:
| |
Collapse
|
7
|
Kukita Y, Miyatake K, Stokowski R, Hinds D, Higasa K, Wake N, Hirakawa T, Kato H, Matsuda T, Pant K, Cox D, Tahira T, Hayashi K. Genome-wide definitive haplotypes determined using a collection of complete hydatidiform moles. Genome Res 2006; 15:1511-8. [PMID: 16251461 PMCID: PMC1310639 DOI: 10.1101/gr.4371105] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
We present genome-wide definitive haplotypes, determined using a collection of 74 Japanese complete hydatidiform moles, each carrying a genome derived from a single sperm. The haplotypes incorporate 281,439 common SNPs, genotyped with a high throughput array-based oligonucleotide hybridization technique. Comparison of haplotypes inferred from pseudoindividuals (constructed from randomized mole pairs) with those of moles showed some switch errors in resolution of phases by the computational inference method. The effects of these errors on local haplotype structure and selection of tag SNPs are discussed. We also show that definitive haplotypes of moles may be useful for elucidation of long-range haplotype structure, and should be more effective for detecting extended haplotype homozygosity indicative of positive selection.
Collapse
Affiliation(s)
- Yoji Kukita
- Division of Genome Analysis, Research Center for Genetic Information, Medical Institute of Bioregulation, Kyushu University, Fukuoka, Fukuoka 812-8582, Japan
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
8
|
Zhang XW, Zhu HB, Wu SY, Gao RL, Han JS, Wang XY, Guo HY, Zhao YY, Zhao WQ, Li R. Distribution of the alleles at loci D16S539, D7S820, and D13S317 in hydatidiform mole genome from Chinese women and its relationship with clinical prognosis. ACTA ACUST UNITED AC 2006; 164:133-6. [PMID: 16434316 DOI: 10.1016/j.cancergencyto.2005.07.017] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2005] [Revised: 07/19/2005] [Accepted: 07/20/2005] [Indexed: 10/25/2022]
Abstract
Using polymerase chain reaction and denaturating polyacrylamide gel electrophoretic techniques, we studied 53 cases of hydatidiform moles. Of these, 41 cases were genetically complete hydatidiform moles (g-CHM) whose genome were totally paternally derived. We investigated the distribution of the alleles in the short tandem repeat sequences at loci D16S539, D7S820, and D13S317 in these cases. In particular, we analyzed the allelic distribution and potential significance in cases with traceable benign and invasive moles (i.e., persistent trophoblastic tumor [PTT]). Among 41 g-CHM cases, there were six alleles at D16S539, five alleles at D7S820 (the frequencies of alleles 9 and 10 were respectively lower and higher than those in Beijing population), and seven alleles at D13S317; the heterozygosity of loci D16S539, D7S820, and D13S317 was 0.0732, 0.0976, and 0.0732, respectively. Among 23 benign cases, there were six alleles at D16S539, four at D7S820, and six at D13S317; among 11 PTT cases, there were five alleles at D7S820 and four alleles each at D16S539 and D13S317. The frequencies of allele 9 at D16S539 and allele 10 at D7S820 were higher than in benign cases (P < 0.05). There were significant differences in frequencies of alleles 9 and 10 at D7S820 between the cases and the Beijing population, and heterozygosity at the three loci was lower in the cases than in the population. In addition, invasiveness of hydatidiform mole correlated to the frequency of allele 9 at loci D16S539 and allele 10 at D7S820.
Collapse
Affiliation(s)
- Xiao Wei Zhang
- Department of Obstetrics and Gynecology, Peking University Third Hospital, Beijing 100083, China.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
9
|
Fan JB, Surti U, Taillon-Miller P, Hsie L, Kennedy GC, Hoffner L, Ryder T, Mutch DG, Kwok PY. Paternal origins of complete hydatidiform moles proven by whole genome single-nucleotide polymorphism haplotyping. Genomics 2002; 79:58-62. [PMID: 11827458 DOI: 10.1006/geno.2001.6676] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Complete hydatidiform moles (CHMs) are diploid tumors that result from fertilization of an empty ovum by a haploid 23,X sperm. In most cases, the resulting duplication of the genome gives rise to a 46,XX genotype and is thought to be androgenetic in origin. If this hypothesis is correct, then the genotypes of all polymorphic markers in CHMs should be homozygous. We used a dense set of single-nucleotide polymorphism (SNP) markers, evenly spaced throughout the genome, to definitively test this hypothesis. We genotyped genomic DNA samples from five CHMs and their corresponding maternal samples with 1494 SNP markers using high-density microarrays (HuSNP). As predicted, the maternal samples were heterozygous at >25% of the markers, which is consistent with the expected average heterozygosity of this panel of SNPs. In contrast, the five CHM samples were heterozygous at <0.75% of the SNP markers, which shows that these diploid tumors consist of a duplicated set of chromosomes. Because the CHM genotypes represent the haplotypes of their genomes, our results show that long-range haplotypes can be obtained easily with this resource and that a collection of such samples is a simple way to obtain reference haplotypes for association studies in various populations.
Collapse
Affiliation(s)
- Jian-Bing Fan
- Affymetrix, 3380 Central Expressway, Santa Clara, CA 95051, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
10
|
Miller RD, Taillon-Miller P, Kwok PY. Regions of low single-nucleotide polymorphism incidence in human and orangutan xq: deserts and recent coalescences. Genomics 2001; 71:78-88. [PMID: 11161800 DOI: 10.1006/geno.2000.6417] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
While scanning for single-nucleotide polymorphisms (SNPs) in the human Xq25-q28 region of CEPH families, we found six long "deserts" of low SNP incidence representing 28% of the investigated genome. One was 1.66 Mb in length. To determine whether these SNP deserts were due to reduced input of mutations or to recent coalescent events such as bottlenecks or selective sweeps, comparative sequence was determined from a female orangutan. The mean divergence was 2.9% and was not reduced in deserts compared with nondesert regions. Thus, the best explanation for the SNP deserts is recent coalescent events in humans. These events are the cause of substantial variation in human noncoding SNP incidence. In addition, the mutational spectrum in humans and orangutans was estimated as 63% AG (and CT), 17% AC (and GT), 8% CG, 4% AT, and 8% insertion/deletions. The average lifetime of a SNP destined to become fixed for a new allele between these species was estimated as 284,000 years.
Collapse
Affiliation(s)
- R D Miller
- Division of Dermatology, Washington University School of Medicine, St. Louis, Missouri 63110, USA.
| | | | | |
Collapse
|
11
|
Taillon-Miller P, Bauer-Sardiña I, Saccone NL, Putzel J, Laitinen T, Cao A, Kere J, Pilia G, Rice JP, Kwok PY. Juxtaposed regions of extensive and minimal linkage disequilibrium in human Xq25 and Xq28. Nat Genet 2000; 25:324-8. [PMID: 10888883 DOI: 10.1038/77100] [Citation(s) in RCA: 185] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
Linkage disequilibrium (LD), or the non-random association of alleles, is poorly understood in the human genome. Population genetic theory suggests that LD is determined by the age of the markers, population history, recombination rate, selection and genetic drift. Despite the uncertainties in determining the relative contributions of these factors, some groups have argued that LD is a simple function of distance between markers. Disease-gene mapping studies and a simulation study gave differing predictions on the degree of LD in isolated and general populations. In view of the discrepancies between theory and experimental observations, we constructed a high-density SNP map of the Xq25-Xq28 region and analysed the male genotypes and haplotypes across this region for LD in three populations. The populations included an outbred European sample (CEPH males) and isolated population samples from Finland and Sardinia. We found two extended regions of strong LD bracketed by regions with no evidence for LD in all three samples. Haplotype analysis showed a paucity of haplotypes in regions of strong LD. Our results suggest that, in this region of the X chromosome, LD is not a monotonic function of the distance between markers, but is more a property of the particular location in the human genome.
Collapse
Affiliation(s)
- P Taillon-Miller
- Division of Dermatology, Washington University, St. Louis, Missouri, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
12
|
Abstract
A high-density single-nucleotide polymorphism (SNP) map was developed for Xq25-q28 using a targeted approach to SNP discovery. This high-density map includes 217 new SNP markers, and 117 are informative in the CEPH parent population with >20% minor allele frequency. The average distance between SNP markers is 100 kb in the targeted regions. This is the densest genetic map of Xq25-q28 to date. The SNP markers are presented in order by their distance in megabases along the X chromosome, and the markers from the current genetic map are placed using the same scale to produce an integrated map of the region.
Collapse
Affiliation(s)
- P Taillon-Miller
- Division of Dermatology, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | | |
Collapse
|
13
|
Marth GT, Korf I, Yandell MD, Yeh RT, Gu Z, Zakeri H, Stitziel NO, Hillier L, Kwok PY, Gish WR. A general approach to single-nucleotide polymorphism discovery. Nat Genet 1999; 23:452-6. [PMID: 10581034 DOI: 10.1038/70570] [Citation(s) in RCA: 440] [Impact Index Per Article: 17.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Single-nucleotide polymorphisms (SNPs) are the most abundant form of human genetic variation and a resource for mapping complex genetic traits. The large volume of data produced by high-throughput sequencing projects is a rich and largely untapped source of SNPs (refs 2, 3, 4, 5). We present here a unified approach to the discovery of variations in genetic sequence data of arbitrary DNA sources. We propose to use the rapidly emerging genomic sequence as a template on which to layer often unmapped, fragmentary sequence data and to use base quality values to discern true allelic variations from sequencing errors. By taking advantage of the genomic sequence we are able to use simpler yet more accurate methods for sequence organization: fragment clustering, paralogue identification and multiple alignment. We analyse these sequences with a novel, Bayesian inference engine, POLYBAYES, to calculate the probability that a given site is polymorphic. Rigorous treatment of base quality permits completely automated evaluation of the full length of all sequences, without limitations on alignment depth. We demonstrate this approach by accurate SNP predictions in human ESTs aligned to finished and working-draft quality genomic sequences, a data set representative of the typical challenges of sequence-based SNP discovery.
Collapse
Affiliation(s)
- G T Marth
- Washington University Department of Genetics and Genome Sequencing Center, St. Louis, Missouri, USA.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
14
|
Taillon-Miller P, Piernot EE, Kwok PY. Efficient Approach to Unique Single-Nucleotide Polymorphism Discovery. Genome Res 1999. [DOI: 10.1101/gr.9.5.499] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Single-nucleotide polymorphisms (SNPs) are the most frequently found DNA sequence variations in the human genome. It has been argued that a dense set of SNP markers can be used to identify genetic factors associated with complex disease traits. Because all high-throughput genotyping methods require precise sequence knowledge of the SNPs, any SNP discovery approach must involve both the determination of DNA sequence and allele frequencies. Furthermore, high-throughput genotyping also requires a genomic DNA amplification step, making it necessary to develop sequence-tagged sites (STSs) that amplify only the DNA fragment containing the SNP and nothing else from the rest of the genome. In this report, we demonstrate the utility of a SNP-screening approach that yields the DNA sequence and allele frequency information while screening out duplications with minimal cost and effort. Our approach is based on the use of a homozygous complete hydatidiform mole (CHM) as the reference. With this homozygous reference, one can identify and estimate the allele frequencies of common SNPs with a pooled DNA-sequencing approach (rather than having to sequence numerous individuals as is commonly done). More importantly, the CHM reference is preferable to a single individual reference because it reveals readily any duplicated regions of the genome amplified by the PCR assay before the duplicated sequences are found in GenBank. This approach reduces the cost of SNP discovery by 60% and eliminates the costly development of SNP markers that cannot be amplified uniquely from the genome.[Sequence data for this article were deposited with the NCBI dbSTS and dbSNP data libraries under accession nos. G42862–G42905]
Collapse
|
15
|
Landegren U, Nilsson M, Kwok PY. Reading bits of genetic information: methods for single-nucleotide polymorphism analysis. Genome Res 1998; 8:769-76. [PMID: 9724323 DOI: 10.1101/gr.8.8.769] [Citation(s) in RCA: 232] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Affiliation(s)
- U Landegren
- Department of Genetics and Pathology, Uppsala University, Se-751 23 Uppsala, Sweden.
| | | | | |
Collapse
|
16
|
Taillon-Miller P, Gu Z, Li Q, Hillier L, Kwok PY. Overlapping genomic sequences: a treasure trove of single-nucleotide polymorphisms. Genome Res 1998; 8:748-54. [PMID: 9685323 PMCID: PMC310751 DOI: 10.1101/gr.8.7.748] [Citation(s) in RCA: 117] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
An efficient strategy to develop a dense set of single-nucleotide polymorphism (SNP) markers is to take advantage of the human genome sequencing effort currently under way. Our approach is based on the fact that bacterial artificial chromosomes (BACs) and P1-based artificial chromosomes (PACs) used in long-range sequencing projects come from diploid libraries. If the overlapping clones sequenced are from different lineages, one is comparing the sequences from 2 homologous chromosomes in the overlapping region. We have analyzed in detail every SNP identified while sequencing three sets of overlapping clones found on chromosome 5p15.2, 7q21-7q22, and 13q12-13q13. In the 200.6 kb of DNA sequence analyzed in these overlaps, 153 SNPs were identified. Computer analysis for repetitive elements and suitability for STS development yielded 44 STSs containing 68 SNPs for further study. All 68 SNPs were confirmed to be present in at least one of the three (Caucasian, African-American, Hispanic) populations studied. Furthermore, 42 of the SNPs tested (62%) were informative in at least one population, 32 (47%) were informative in two or more populations, and 23 (34%) were informative in all three populations. These results clearly indicate that developing SNP markers from overlapping genomic sequence is highly efficient and cost effective, requiring only the two simple steps of developing STSs around the known SNPs and characterizing them in the appropriate populations.
Collapse
MESH Headings
- Bacteriophage P1/genetics
- Base Sequence
- Chromosomes, Bacterial/genetics
- Chromosomes, Human, Pair 13/genetics
- Chromosomes, Human, Pair 5/genetics
- Chromosomes, Human, Pair 7/genetics
- Cloning, Molecular/methods
- Genes, Overlapping/genetics
- Genome, Human
- Humans
- Molecular Sequence Data
- Polymorphism, Genetic/genetics
- Sequence Analysis, DNA/methods
Collapse
Affiliation(s)
- P Taillon-Miller
- Division of Dermatology, Washington University School of Medicine, St. Louis, Missouri 63110 USA
| | | | | | | | | |
Collapse
|