1
|
Sparse Convolutional Denoising Autoencoders for Genotype Imputation. Genes (Basel) 2019; 10:genes10090652. [PMID: 31466333 PMCID: PMC6769581 DOI: 10.3390/genes10090652] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2019] [Revised: 08/23/2019] [Accepted: 08/24/2019] [Indexed: 12/14/2022] Open
Abstract
Genotype imputation, where missing genotypes can be computationally imputed, is an essential tool in genomic analysis ranging from genome wide associations to phenotype prediction. Traditional genotype imputation methods are typically based on haplotype-clustering algorithms, hidden Markov models (HMMs), and statistical inference. Deep learning-based methods have been recently reported to suitably address the missing data problems in various fields. To explore the performance of deep learning for genotype imputation, in this study, we propose a deep model called a sparse convolutional denoising autoencoder (SCDA) to impute missing genotypes. We constructed the SCDA model using a convolutional layer that can extract various correlation or linkage patterns in the genotype data and applying a sparse weight matrix resulted from the L1 regularization to handle high dimensional data. We comprehensively evaluated the performance of the SCDA model in different scenarios for genotype imputation on the yeast and human genotype data, respectively. Our results showed that SCDA has strong robustness and significantly outperforms popular reference-free imputation methods. This study thus points to another novel application of deep learning models for missing data imputation in genomic studies.
Collapse
|
2
|
Abstract
In the past few years genome-wide association (GWA) studies have uncovered a large number of convincingly replicated associations for many complex human diseases. Genotype imputation has been used widely in the analysis of GWA studies to boost power, fine-map associations and facilitate the combination of results across studies using meta-analysis. This Review describes the details of several different statistical methods for imputing genotypes, illustrates and discusses the factors that influence imputation performance, and reviews methods that can be used to assess imputation performance and test association at imputed SNPs.
Collapse
|
3
|
Meaburn EL, Harlaar N, Craig IW, Schalkwyk LC, Plomin R. Quantitative trait locus association scan of early reading disability and ability using pooled DNA and 100K SNP microarrays in a sample of 5760 children. Mol Psychiatry 2008; 13:729-40. [PMID: 17684495 DOI: 10.1038/sj.mp.4002063] [Citation(s) in RCA: 68] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/30/2007] [Revised: 06/14/2007] [Accepted: 06/27/2007] [Indexed: 11/09/2022]
Abstract
Quantitative genetic research suggests that reading disability is the quantitative extreme of the same genetic and environmental factors responsible for normal variation in reading ability. This finding warrants a quantitative trait locus (QTL) strategy that compares low versus high extremes of the normal distribution of reading in the search for QTLs associated with variation throughout the distribution. A low reading ability group (N=755) and a high reading group (N=747) were selected from a representative UK sample of 7-year-olds assessed on two measures of reading that we have shown to be highly heritable and highly genetically correlated. The low and high reading ability groups were each divided into 10 independent DNA pools and the 20 pools were assayed on 100 K single nucleotide polymorphism (SNP) microarrays to screen for the largest allele frequency differences between the low and high reading ability groups. Seventy five of these nominated SNPs were individually genotyped in an independent sample of low (N=452) and high (N=452) reading ability children selected from a second sample of 4258 7-year-olds. Nine of the seventy-five SNPs were nominally significant (P<0.05) in the predicted direction. These 9 SNPs and 14 other SNPs showing low versus high allele frequency differences in the predicted direction were genotyped in the rest of the second sample to test the QTL hypothesis. Ten SNPs yielded nominally significant linear associations in the expected direction across the distribution of reading ability. However, none of these SNP associations accounted for more than 0.5% of the variance of reading ability, despite 99% power to detect them. We conclude that QTL effect sizes, even for highly heritable common disorders and quantitative traits such as early reading disability and ability, might be much smaller than previously considered.
Collapse
Affiliation(s)
- E L Meaburn
- Social, Genetic and Developmental Psychiatry Centre, Institute of Psychiatry, King's College, London, UK.
| | | | | | | | | |
Collapse
|
4
|
Yoediono Z, Snyderman R. Proposal for a new health record to support personalized, predictive, preventative and participatory medicine. Per Med 2008; 5:47-54. [PMID: 29783395 DOI: 10.2217/17410541.5.1.47] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Today's approach to patient care and the medical record that directs and documents it is largely focused on identifying and treating the patient's disease. This has resulted in a sporadic, reactive healthcare system. Shifting medicine's focus to personalized strategic health planning will require a new approach to the patient 'work-up', a new relationship between the patient and the provider and a new medical record to support it. A prospective health record should be developed to enable personalized and preventative strategies, including assessment of health risks, evaluation of current health status, tracking of disease pathogenesis, prediction of disease events and long-term planning for maintaining wellness and limiting disease. The record should utilize emerging technologies to track predictive clinical risk factors, thereby enabling preventative responses and personalized medicine.
Collapse
Affiliation(s)
- Ziggy Yoediono
- Duke University, Duke Center for Research on Prospective Health Care, Duke University Medical Center, 3059, Durham, NC 27705, USA.
| | - Ralph Snyderman
- Duke University, Duke Center for Research on Prospective Health Care, Duke University Medical Center, 3059, Durham, NC 27705, USA.
| |
Collapse
|
5
|
Liang KH, Wu YJ. Prediction of complex traits based on the epistasis of multiple haplotypes. J Hum Genet 2007; 52:456-463. [PMID: 17427028 DOI: 10.1007/s10038-007-0140-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2006] [Accepted: 03/07/2007] [Indexed: 11/28/2022]
Abstract
Analysis of epistasis, or gene-gene interactions, is of particular importance for revealing the molecular mechanisms of complex human diseases. Multiple genes, each of which has a moderate effect, might interact and produce a complex phenotypic trait. In this paper, we present a novel method of epistasis analysis, utilizing multiple phase-resolved haplotypes residing in different genomic regions. Prediction models can then be derived from the epistasis to indicate the susceptibility of a person to a dichrotomous phenotypic trait. The simulation results showed that the prediction accuracy of this method is dependent on the penetrance rate of the underlying model. The computation cost, on the other hand, is dependent on the number of genomic regions involved for the complex phenotypic trait.
Collapse
Affiliation(s)
- Kung-Hao Liang
- Vita Genomics, Inc., 7F, No.6, Sec.1, Jungshing Rd., Wugu Shiang, Taipei County, 248, Taiwan.
| | - Ying-Jye Wu
- Vita Genomics, Inc., 7F, No.6, Sec.1, Jungshing Rd., Wugu Shiang, Taipei County, 248, Taiwan
| |
Collapse
|
6
|
Abstract
Many genetic analyses are done with incomplete information; for example, unknown phase in haplotype-based association studies. Measures of the amount of available information can be used for efficient planning of studies and/or analyses. In particular, the linkage disequilibrium (LD) between two sets of markers can be interpreted as the amount of information one set of markers contains for testing allele frequency differences in the second set, and measuring LD can be viewed as quantifying information in a missing data problem. We introduce a framework for measuring the association between two sets of variables; for example, genotype data for two distinct groups of markers, or haplotype and genotype data for a given set of polymorphisms. The goal is to quantify how much information is in one data set, e.g. genotype data for a set of SNPs, for estimating parameters that are functions of frequencies in the second data set, e.g. haplotype frequencies, relative to the ideal case of actually observing the complete data, e.g. haplotypes. In the case of genotype data on two mutually exclusive sets of markers, the measure determines the amount of multi-locus LD, and is equal to the classical measure r(2), if the sets consist each of one bi-allelic marker. In general, the measures are interpreted as the asymptotic ratio of sample sizes necessary to achieve the same power in case-control testing. The focus of this paper is on case-control allele/haplotype tests, but the framework can be extended easily to other settings like regressing quantitative traits on allele/haplotype counts, or tests on genotypes or diplotypes. We highlight applications of the approach, including tools for navigating the HapMap database [The International HapMap Consortium, 2003], and genotyping strategies for positional cloning studies.
Collapse
Affiliation(s)
- Dan L Nicolae
- Departments of Medicine and Statistics, The University of Chicago, Chicago, Illinois 60637, USA.
| |
Collapse
|
7
|
Carlton VEH, Ireland JS, Useche F, Faham M. Functional single nucleotide polymorphism-based association studies. Hum Genomics 2006; 2:391-402. [PMID: 16848977 PMCID: PMC3525158 DOI: 10.1186/1479-7364-2-6-391] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Association studies hold great promise for the elucidation of the genetic basis of diseases. Studies based on functional single nucleotide polymorphisms (SNPs) or on linkage disequilibrium (LD) represent two main types of designs. LD-based association studies can be comprehensive for common causative variants, but they perform poorly for rare alleles. Conversely, functional SNP-based studies are efficient because they focus on the SNPs with the highest a priori chance of being associated. Our poor ability to predict the functional effect of SNPs, however, hampers attempts to make these studies comprehensive. Recent progress in comparative genomics, and evidence that functional elements tend to lie in conserved regions, promises to change the landscape, permitting functional SNP association studies to be carried out that comprehensively assess common and rare alleles. SNP genotyping technologies are already sufficient for such studies, but studies will require continued genomic sequencing of multiple species, research on the functional role of conserved sequences and additional SNP discovery and validation efforts (including targeted SNP discovery to identify the rare alleles in functional regions). With these resources, we expect that comprehensive functional SNP association studies will soon be possible.
Collapse
Affiliation(s)
- Victoria EH Carlton
- ParAllele BioScience (Now Affymetrix, Inc), 7300 Shoreline Boulevard, South San Francisco, CA 94080, USA
| | - James S Ireland
- ParAllele BioScience (Now Affymetrix, Inc), 7300 Shoreline Boulevard, South San Francisco, CA 94080, USA
| | - Francisco Useche
- ParAllele BioScience (Now Affymetrix, Inc), 7300 Shoreline Boulevard, South San Francisco, CA 94080, USA
| | - Malek Faham
- ParAllele BioScience (Now Affymetrix, Inc), 7300 Shoreline Boulevard, South San Francisco, CA 94080, USA
| |
Collapse
|
8
|
Lou XY, Ma JZ, Sun D, Payne TJ, Li MD. Fine mapping of a linkage region on chromosome 17p13 reveals that GABARAP and DLG4 are associated with vulnerability to nicotine dependence in European-Americans. Hum Mol Genet 2006; 16:142-53. [PMID: 17164261 DOI: 10.1093/hmg/ddl450] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
A two-stage association study was conducted targeting a genomic region on chromosome 17p13 that we reported likely to harbor susceptibility gene(s) for nicotine dependence (ND). Participants were 2037 subjects from 602 nuclear families of either African-American (AA) or European-American (EA) origin from our Mid-South Tobacco Family (MSTF) cohort. We first examined 10 single nucleotide polymorphisms (SNPs) in six genes within the targeted region of about 90 kb to determine which SNP/gene was associated with ND, assessed by smoking quantity (SQ), the heaviness of smoking index (HSI) and the Fagerström Test for ND (FTND). Individual SNP analysis revealed that SNPs rs17710 and rs222843 in GABA(A) receptor-associated protein (GABARAP) exhibited a significant association with at least one age- and gender-adjusted ND measure in the EA sample and rs222843 remained significant with the FTND after correction for multiple testing (P = 0.009). Although no SNP in DLG4 was significantly associated with ND, we found a G-G haplotype with a frequency of 14.2% formed by SNPs rs2242449 and rs507506 within the gene that showed significant inverse associations with all three ND measures [P = 0.003, 0.015 and 0.024, for SQ (defined as the number of cigarettes smoked per day), HSI and FTND, respectively]. We also found an A-A haplotype with a frequency of 8.8% formed by SNPs rs17710 and rs222843 in GABARAP, which revealed significant associations with all three ND measures (P = 0.006, 0.019 and 0.024, for SQ, HSI and FTND, respectively). To confirm these findings with a better coverage of GABARAP and DLG4, we conducted a second-stage association analysis by genotyping four more SNPs for GABARAP and nine more for DLG4 on the same set of samples. Our results from the second stage of individual SNP- and/or haplotype-based association analysis supported our finding of significant association of the DLG4 gene with ND. No significant association of GABARAP or DLG4 with ND was detected in the AA sample. Further, by comparing the linkage signal before and after adjustment for the SNPs of GABARAP and DLG4, we found that inclusion of the SNPs of the two genes as covariates largely reduced the linkage signal in the EA sample, but kept nearly unchanged in the AA sample. Taken together, our two-stage association analysis and linkage analysis results indicate that the GABARAP and DLG4 genes are involved in the etiology of ND in EA smokers. Further investigation of neurobiological mechanisms of the two genes in the etiology of ND is thus warranted.
Collapse
Affiliation(s)
- Xiang-Yang Lou
- Department of Psychiatry and Neurobehavioral Sciencesm, University of Virginia, Charlottesville, VA, USA
| | | | | | | | | |
Collapse
|
9
|
Paschou P, Mahoney MW, Javed A, Kidd JR, Pakstis AJ, Gu S, Kidd KK, Drineas P. Intra- and interpopulation genotype reconstruction from tagging SNPs. Genome Res 2006; 17:96-107. [PMID: 17151345 PMCID: PMC1716273 DOI: 10.1101/gr.5741407] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
The optimal method to be used for tSNP selection, the applicability of a reference LD map to unassayed populations, and the scalability of these methods to genome-wide analysis, all remain subjects of debate. We propose novel, scalable matrix algorithms that address these issues and we evaluate them on genotypic data from 38 populations and four genomic regions (248 SNPs typed for approximately 2000 individuals). We also evaluate these algorithms on a second data set consisting of genotypes available from the HapMap database (1336 SNPs for four populations) over the same genomic regions. Furthermore, we test these methods in the setting of a real association study using a publicly available family data set. The algorithms we use for tSNP selection and unassayed SNP reconstruction do not require haplotype inference and they are, in principle, scalable even to genome-wide analysis. Moreover, they are greedy variants of recently developed matrix algorithms with provable performance guarantees. Using a small set of carefully selected tSNPs, we achieve very good reconstruction accuracy of "untyped" genotypes for most of the populations studied. Additionally, we demonstrate in a quantitative manner that the chosen tSNPs exhibit substantial transferability, both within and across different geographic regions. Finally, we show that reconstruction can be applied to retrieve significant SNP associations with disease, with important genotyping savings.
Collapse
Affiliation(s)
- Peristera Paschou
- Department of Genetics, Yale University School of Medicine, New Haven, CT 06511, USA.
| | | | | | | | | | | | | | | |
Collapse
|
10
|
Yoo YK, Ke X, Hong S, Jang HY, Park K, Kim S, Ahn T, Lee YD, Song O, Rho NY, Lee MS, Lee YS, Kim J, Kim YJ, Yang JM, Song K, Kimm K, Weir B, Cardon LR, Lee JE, Hwang JJ. Fine-scale map of encyclopedia of DNA elements regions in the Korean population. Genetics 2006; 174:491-7. [PMID: 16702437 PMCID: PMC1569806 DOI: 10.1534/genetics.105.052225] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The International HapMap Project aims to generate detailed human genome variation maps by densely genotyping single-nucleotide polymorphisms (SNPs) in CEPH, Chinese, Japanese, and Yoruba samples. This will undoubtedly become an important facility for genetic studies of diseases and complex traits in the four populations. To address how the genetic information contained in such variation maps is transferable to other populations, the Korean government, industries, and academics have launched the Korean HapMap project to genotype high-density Encyclopedia of DNA Elements (ENCODE) regions in 90 Korean individuals. Here we show that the LD pattern, block structure, haplotype diversity, and recombination rate are highly concordant between Korean and the two HapMap Asian samples, particularly Japanese. The availability of information from both Chinese and Japanese samples helps to predict more accurately the possible performance of HapMap markers in Korean disease-gene studies. Tagging SNPs selected from the two HapMap Asian maps, especially the Japanese map, were shown to be very effective for Korean samples. These results demonstrate that the HapMap variation maps are robust in related populations and will serve as an important resource for the studies of the Korean population in particular.
Collapse
|
11
|
Abstract
Emerging scientific technologies provide rich sources of predictive biomarkers, which will enable the development of tools to quantify risk and anticipate disease, so health care can become rational, preventive and personalized. Emerging scientific technologies provide rich sources of predictive biomarkers, which could transform health care. Identification of causal biomarkers will enable the development of tools to quantify risk and anticipate disease. Accurate health risk analysis is rapidly becoming feasible, so health care can become rational, preventive and personalized.
Collapse
Affiliation(s)
- Ralph Snyderman
- Duke University Medical Center, DUMC 3059, Durham, NC 27710, USA.
| | | |
Collapse
|
12
|
Gagnon A, Beise J, Vaupel JW. Genome-wide identity-by-descent sharing among CEPH siblings. Genet Epidemiol 2006; 29:215-24. [PMID: 16121355 DOI: 10.1002/gepi.20090] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Abstract
The concept of genetic identity-by-descent (IBD) has markedly advanced our understanding of the genetic similarity among relatives and triggered a number of developments in epidemiological genetics. However, no empirical measure of this relatedness throughout the whole human genome has yet been published. Analyzing highly polymorphic genetic variations from the Centre d'études du polymorphisme humain (CEPH) database, we report the first genome-wide estimation of the mean and variation in IBD sharing among siblings. From 1,522 microsatellite markers spaced at an average of 2.3 cM on 498 sibling pairs, we estimated a mean of 0.4994 and a standard deviation of 0.0395. In order to account for the impact of varying chromosomal lengths and recombination rates, the analysis was also performed at the chromosomal and marker levels and for paternal and maternal DNA separately. Based on the variation, we estimate an "effective number of segregating loci" of around 80 for sibling pairs over the whole genome (i.e., the number of loci that would yield the same standard deviation in IBD sharing if all loci were segregating independently). Finally, we briefly assess the impact of genotyping errors on IBD estimations, compare our results to published theoretical and simulated expectations, and discuss some implications of our findings.
Collapse
Affiliation(s)
- Alain Gagnon
- Department of Sociology, Aging and Health Research Centre, Population Studies Centre, University of Western Ontario, London, Ontario, Canada.
| | | | | |
Collapse
|
13
|
Abstract
It has become obvious from epidemiological studies in families of patients affected or from twin studies, that most psychiatric disorders are in part genetically determined. Genetics have raised incredible hopes that the complex nature of psychiatric disorders might be unravelled. However, progress in psychiatry genetics have met major difficulties that have hampered psychiatry taking advantage of the new technologies as compared to other fields, such as neurology. In this non-exhaustive review, we propose an overview from the initial evidence to the expected future, through a critical statement on the current situation.
Collapse
Affiliation(s)
- Marie-Odile Krebs
- INSERM E0117-Paris V, Université René Descartes, Paris; Hôpital Sainte-Anne, 7 Rue Cabanis, 75014 Paris, France.
| |
Collapse
|
14
|
Lawrence R, Evans DM, Morris AP, Ke X, Hunt S, Paolucci M, Ragoussis J, Deloukas P, Bentley D, Cardon LR. Genetically indistinguishable SNPs and their influence on inferring the location of disease-associated variants. Genome Res 2005; 15:1503-10. [PMID: 16251460 PMCID: PMC1310638 DOI: 10.1101/gr.4217605] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2005] [Accepted: 08/30/2005] [Indexed: 01/25/2023]
Abstract
As part of a recent high-density linkage disequilibrium (LD) study of chromosome 20, we obtained genotypes for approximately 30,000 SNPs at a density of 1 SNP/2 kb on four different population samples (47 CEPH founders; 91 UK unrelateds [unrelated white individuals of western European ancestry]; 97 African Americans; 42 East Asians). We observed that approximately 50% of SNPs had at least one genetically indistinguishable partner; i.e., for every individual considered, their genotype at the first locus was identical to their genotype at the second locus, or in LD terms, the SNPs were in "perfect" LD (r2 = 1.0). These "genetically indistinguishable SNPs" (giSNPs) formed into clusters of varying size. The larger the cluster, the greater the tendency to be located within genes and to overlap with giSNP clusters in other population samples. As might be expected for this map density, many giSNPs were located close to one another, thus reflecting local regions of undetected recombination or haplotype blocks. However, approximately 1/3 of giSNP clusters had intermingled, non-indistinguishable SNPs with incomplete LD (D' and r2 <1), sometimes spanning hundreds of kilobases, comprising up to 70 indistinguishable markers and overlapping multiple haplotype blocks. These long-range, nonconsecutive giSNPs have implications for disease gene localization by allelic association as evidence for association at one locus will be indistinguishable from that at another locus, even though both loci may be situated far apart. We describe the distribution of giSNPs on this map of chromosome 20 and illustrate the potential impact they can have on association mapping.
Collapse
Affiliation(s)
- Robert Lawrence
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
| | | | | | | | | | | | | | | | | | | |
Collapse
|
15
|
Philippi A, Roschmann E, Tores F, Lindenbaum P, Benajou A, Germain-Leclerc L, Marcaillou C, Fontaine K, Vanpeene M, Roy S, Maillard S, Decaulne V, Saraiva JP, Brooks P, Rousseau F, Hager J. Haplotypes in the gene encoding protein kinase c-beta (PRKCB1) on chromosome 16 are associated with autism. Mol Psychiatry 2005; 10:950-960. [PMID: 16027742 DOI: 10.1038/sj.mp.4001704] [Citation(s) in RCA: 54] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/09/2005] [Revised: 05/17/2005] [Accepted: 05/24/2005] [Indexed: 11/08/2022]
Abstract
Autism is a developmental disorder characterized by impairments in social interaction and communication associated with repetitive patterns of interest or behavior. Autism is highly influenced by genetic factors. Genome-wide linkage and candidate gene association approaches have been used to try and identify autism genes. A few loci have repeatedly been reported linked to autism. Several groups reported evidence for linkage to a region on chromosome 16p. We have applied a direct physical identity-by-descent (IBD) mapping approach to perform a high-density (0.85 megabases) genome-wide linkage scan in 116 families from the AGRE collection. Our results confirm linkage to a region on chromosome 16p with autism. High-resolution single-nucleotide polymorphism (SNP) genotyping and analysis of this region show that haplotypes in the protein kinase c-beta gene are strongly associated with autism. An independent replication of the association in a second set of 167 trio families with autism confirmed our initial findings. Overall, our data provide evidence that the PRKCB1 gene on chromosome 16p may be involved in the etiology of autism.
Collapse
|
16
|
Yalcin B, Flint J, Mott R. Using progenitor strain information to identify quantitative trait nucleotides in outbred mice. Genetics 2005; 171:673-81. [PMID: 16085706 PMCID: PMC1456780 DOI: 10.1534/genetics.104.028902] [Citation(s) in RCA: 87] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Abstract
We have developed a fast and economical strategy for dissecting the genetic architecture of quantitative trait loci at a molecular level. The method uses two pieces of information: mapping data from crosses that involve more than two inbred strains and sequence variants in the progenitor strains within the interval containing a quantitative trait locus (QTL). By testing whether the strain distribution pattern in the progenitor strains is consistent with the observed genetic effect of the QTL we can assign a probability that any sequence variant is a quantitative trait nucleotide (QTN). It is not necessary to genotype the animals except at a skeleton of markers; the genotypes at all other polymorphisms are estimated by a multipoint analysis. We apply the method to a 4.8-Mb region on mouse chromosome 1 that contains a QTL influencing anxiety segregating in a heterogeneous stock and show that, under the assumption that a single QTN is present and lies in a region conserved between the human and mouse genomes, it is possible to reduce the number of variants likely to be the quantitative trait nucleotide from many thousands to <20.
Collapse
Affiliation(s)
- B Yalcin
- Wellcome Trust Centre for Human Genetics, Oxford University, UK
| | | | | |
Collapse
|
17
|
Abstract
When comparing the immune genome to the genome in general, a higher prevalence for association with disease is the only genetic feature significant in immune genes as a group. However, some genetic features, such as marked levels of polymorphism and gene duplication, are present in subsets of immune genes, namely the Major Histocompatibility Complex (MHC) and Natural Killer (NK) cell receptor gene complexes. In this review, we discuss features of MHC and NK receptor gene clusters, their epistatic interactions, and the impact of both on association to disease.
Collapse
Affiliation(s)
- James Kelley
- Department of Pathology, Immunology Division, University of Cambridge, Tennis Court Road, Cambridge CB2 1QP, United Kingdom
| | | |
Collapse
|
18
|
Abstract
Genome-wide association studies with SNP markers are expected to allow identification of genes that underlie complex disorders. Hundreds of thousands of SNP markers will be required for comprehensive genome-wide association studies. The development of microarray-based methods for SNP genotyping on this scale remains a demanding task, despite many recent advances in technology for the production of high-density microarrays. A key technical obstacle is the PCR amplification step, which is required to reduce the complexity of and gain sufficient sensitivity for genotyping SNPs in large, diploid genomes. The multiplexing level that can be achieved in PCR does not match that of current microarray-based methods, making PCR the limiting step in the assays. Highly multiplexed microarray systems for SNP genotyping have recently been developed by combining well-known reaction principles for DNA amplification and SNP genotyping in clever ways. These new methods offer the potential of genome-wide SNP mapping of genes involved in complex diseases in the foreseeable future, provided that issues related to selection of the optimal SNP markers, sample throughput and the cost of the assays can be addressed.
Collapse
Affiliation(s)
- Ann-Christine Syvänen
- Department of Medical Sciences, Molecular Medicine, Uppsala University, Entr. 70, University Hospital, 75185 Uppsala, Sweden.
| |
Collapse
|
19
|
Affiliation(s)
- Daniel J Schaid
- Department of Health Sciences Research, Mayo Clinic College of Medicine, Rochester, Minnesota 55905, USA.
| |
Collapse
|
20
|
Affiliation(s)
- David Altshuler
- Broad Institute of Harvard and Massachusetts Institute of Technology, and Massachusetts General Hospital, Boston, MA 02114, USA.
| | | |
Collapse
|
21
|
Bull SB, John S, Briollais L. Fine mapping by linkage and association in nuclear family and case-control designs. Genet Epidemiol 2005; 29 Suppl 1:S48-58. [PMID: 16342184 DOI: 10.1002/gepi.20110] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
This report summarizes the Genetic Analysis Workshop 14 contributions related to fine-mapping strategies, in which examining smaller regions by association with single-nucleotide polymorphisms (SNPs) can yield savings in genotyping and multiple-testing penalties. The aim of the analyses conducted in Group 7 contributions was to localize disease susceptibility loci from either the simulated or the Collaborative Study on the Genetics of Alcoholism (COGA) data within identified regions of linkage. Among the 10 contributions, most groups analyzed the simulated data, one group analyzed the COGA data only, and one group analyzed both data sets. The research questions included evaluation of new methods of analysis, as well as comparisons among alternative methods, analytic strategies, and study designs. Methods of interest included an algorithm for SNP marker ordering, a locally weighted transmission disequilibrium test statistic, a likelihood-ratio test statistic for family-based association in nuclear families, a robust test statistic for case-control association studies, and Bayesian spatial modeling methods for haplotype clustering and association. Evaluations included comparisons among confidence intervals for loci detected via linkage, effects of multiple testing adjustments and trade-offs between type I error and power, comparisons among haplotype-based (multilocus) and genotype-based (multilocus and single-locus) association analyses, and design of fine-mapping and replication studies. While several promising new approaches were identified, further development and evaluation of methods for multiple testing, regression modeling of association with multiple markers and haplotypes, and combined treatment of linkage and association data are necessary if we are to identify many of the genes that contribute to complex traits.
Collapse
Affiliation(s)
- Shelley B Bull
- Samuel Lunenfeld Research Institute of Mount Sinai Hospital and Department of Public Health Sciences, University of Toronto, Toronto, Ontario, Canada.
| | | | | |
Collapse
|