1
|
Zhang K, Zhang H, Hochner H, Chen J. Covariate adjusted inference of parent-of-origin effects using case-control mother-child paired multilocus genotype data. Genet Epidemiol 2021; 45:830-847. [PMID: 34424572 DOI: 10.1002/gepi.22428] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2021] [Revised: 07/08/2021] [Accepted: 07/27/2021] [Indexed: 01/13/2023]
Abstract
It is of great interest to identify parent-of-origin effects (POEs) since POEs play an important role in many human heritable disorders and human early life growth and development. POE is sometimes referred to as imprinting effect in the literature. Compared with the standard logistic regression analyses, retrospective likelihood-based statistical methods are more powerful in identifying POEs when data are collected from related individuals retrospectively. However, none of existing retrospective-based methods can appropriately incorporate covariates that should be adjusted for if they are confounding factors. In this paper, a novel semiparametric statistical method, M-HAP, is developed to detect POEs by fully exploring available information from multilocus genotypes of case-control mother-child pairs and covariates. Some large sample properties are established for M-HAP. Finite sample properties of M-HAP are illustrated by extensive simulation studies and real data applications to the Jerusalem Perinatal Study and the Danish National Birth Cohort study, which confirm the desired superiority of M-HAP over some existing methods. M-HAP has been implemented in the updated R package CCMO.
Collapse
Affiliation(s)
- Kai Zhang
- Department of Statistics and Finance, School of Management, University of Science and Technology of China, Hefei, Anhui, People's Republic of China
| | - Hong Zhang
- Department of Statistics and Finance, School of Management, University of Science and Technology of China, Hefei, Anhui, People's Republic of China
| | - Hagit Hochner
- Braun School of Public Health, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Jinbo Chen
- Department of Biostatistics and Epidemiology, University of Pennsylvania School of Medicine, Philadelphia, Pennsylvania, USA
| |
Collapse
|
2
|
Finke K, Kourakos M, Brown G, Dang HT, Tan SJS, Simons YB, Ramdas S, Schäffer AA, Kember RL, Bućan M, Mathieson S. Ancestral haplotype reconstruction in endogamous populations using identity-by-descent. PLoS Comput Biol 2021; 17:e1008638. [PMID: 33635861 PMCID: PMC7946327 DOI: 10.1371/journal.pcbi.1008638] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2020] [Revised: 03/10/2021] [Accepted: 12/15/2020] [Indexed: 12/24/2022] Open
Abstract
In this work we develop a novel algorithm for reconstructing the genomes of ancestral individuals, given genotype or sequence data from contemporary individuals and an extended pedigree of family relationships. A pedigree with complete genomes for every individual enables the study of allele frequency dynamics and haplotype diversity across generations, including deviations from neutrality such as transmission distortion. When studying heritable diseases, ancestral haplotypes can be used to augment genome-wide association studies and track disease inheritance patterns. The building blocks of our reconstruction algorithm are segments of Identity-By-Descent (IBD) shared between two or more genotyped individuals. The method alternates between identifying a source for each IBD segment and assembling IBD segments placed within each ancestral individual. Unlike previous approaches, our method is able to accommodate complex pedigree structures with hundreds of individuals genotyped at millions of SNPs. We apply our method to an Old Order Amish pedigree from Lancaster, Pennsylvania, whose founders came to North America from Europe during the early 18th century. The pedigree includes 1338 individuals from the past 12 generations, 394 with genotype data. The motivation for reconstruction is to understand the genetic basis of diseases segregating in the family through tracking haplotype transmission over time. Using our algorithm thread, we are able to reconstruct an average of 224 ancestral individuals per chromosome. For these ancestral individuals, on average we reconstruct 79% of their haplotypes. We also identify a region on chromosome 16 that is difficult to reconstruct—we find that this region harbors a short Amish-specific copy number variation and the gene HYDIN. thread was developed for endogamous populations, but can be applied to any extensive pedigree with the recent generations genotyped. We anticipate that this type of practical ancestral reconstruction will become more common and necessary to understand rare and complex heritable diseases in extended families. When analyzing complex heritable traits, genomic data from many generations of an extended family increases the amount of information available for statistical inference. However, typically only genomic data from the recent generations of a pedigree are available, as ancestral individuals are deceased. In this work we present an algorithm, called thread, for reconstructing the genomes of ancestral individuals, given a complex pedigree and genomic data from the recent generations. Previous approaches have not been able to accommodate large datasets (both in terms of sites and individuals), made simplifying assumptions about pedigree structure, or did not tie reconstructed sequences back to specific individuals. We apply thread to a complex Old Order Amish pedigree of 1338 individuals, 394 with genotype data.
Collapse
Affiliation(s)
- Kelly Finke
- Department of Computer Science, Swarthmore College, Swarthmore, Pennsylvania, United States of America
- Department of Biology, Swarthmore College, Swarthmore, Pennsylvania, United States of America
| | - Michael Kourakos
- Department of Computer Science, Swarthmore College, Swarthmore, Pennsylvania, United States of America
| | - Gabriela Brown
- Department of Computer Science, Swarthmore College, Swarthmore, Pennsylvania, United States of America
| | - Huyen Trang Dang
- Department of Computer Science, Bryn Mawr College, Bryn Mawr, Pennsylvania, United States of America
| | - Shi Jie Samuel Tan
- Department of Computer Science, Haverford College, Haverford, Pennsylvania, United States of America
| | - Yuval B. Simons
- Department of Genetics, Stanford University, Stanford, California, United States of America
| | - Shweta Ramdas
- Department of Genetics, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Alejandro A. Schäffer
- Cancer Data Science Laboratory, National Cancer Institute, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Rachel L. Kember
- Department of Psychiatry, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Maja Bućan
- Department of Genetics, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Sara Mathieson
- Department of Computer Science, Haverford College, Haverford, Pennsylvania, United States of America
- * E-mail:
| |
Collapse
|
3
|
Diao G, Lin DY. Statistically efficient association analysis of quantitative traits with haplotypes and untyped SNPs in family studies. BMC Genet 2020; 21:99. [PMID: 32894040 PMCID: PMC7487716 DOI: 10.1186/s12863-020-00902-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2019] [Accepted: 08/17/2020] [Indexed: 11/23/2022] Open
Abstract
BACKGROUND Associations between haplotypes and quantitative traits provide valuable information about the genetic basis of complex human diseases. Haplotypes also provide an effective way to deal with untyped SNPs. Two major challenges arise in haplotype-based association analysis of family data. First, haplotypes may not be inferred with certainty from genotype data. Second, the trait values within a family tend to be correlated because of common genetic and environmental factors. RESULTS To address these challenges, we present an efficient likelihood-based approach to analyzing associations of quantitative traits with haplotypes or untyped SNPs. This approach properly accounts for within-family trait correlations and can handle general pedigrees with arbitrary patterns of missing genotypes. We characterize the genetic effects on the quantitative trait by a linear regression model with random effects and develop efficient likelihood-based inference procedures. Extensive simulation studies are conducted to examine the performance of the proposed methods. An application to family data from the Childhood Asthma Management Program Ancillary Genetic Study is provided. A computer program is freely available. CONCLUSIONS Results from extensive simulation studies show that the proposed methods for testing the haplotype effects on quantitative traits have correct type I error rates and are more powerful than some existing methods.
Collapse
Affiliation(s)
- Guoqing Diao
- Department of Biostatistics and Bioinformatics, The George Washington University, Washington, District of Columbia, USA.
| | - Dan-Yu Lin
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| |
Collapse
|
4
|
Inferring Transmission Bottleneck Size from Viral Sequence Data Using a Novel Haplotype Reconstruction Method. J Virol 2020; 94:JVI.00014-20. [PMID: 32295920 PMCID: PMC7307158 DOI: 10.1128/jvi.00014-20] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2020] [Accepted: 04/08/2020] [Indexed: 12/12/2022] Open
Abstract
Viral populations undergo a repeated cycle of within-host growth followed by transmission. Viral evolution is affected by each stage of this cycle. The number of viral particles transmitted from one host to another, known as the transmission bottleneck, is an important factor in determining how the evolutionary dynamics of the population play out, restricting the extent to which the evolved diversity of the population can be passed from one host to another. Previous study of viral sequence data has suggested that the transmission bottleneck size for influenza A transmission between human hosts is small. Reevaluating these data using a novel and improved method, we largely confirm this result, albeit that we infer a slightly higher bottleneck size in some cases, of between 1 and 13 virions. While a tight bottleneck operates in human influenza transmission, it is not extreme in nature; some diversity can be meaningfully retained between hosts. The transmission bottleneck is defined as the number of viral particles that transmit from one host to establish an infection in another. Genome sequence data have been used to evaluate the size of the transmission bottleneck between humans infected with the influenza virus; however, the methods used to make these estimates have some limitations. Specifically, viral allele frequencies, which form the basis of many calculations, may not fully capture a process which involves the transmission of entire viral genomes. Here, we set out a novel approach for inferring viral transmission bottlenecks; our method combines an algorithm for haplotype reconstruction with maximum likelihood methods for bottleneck inference. This approach allows for rapid calculation and performs well when applied to data from simulated transmission events; errors in the haplotype reconstruction step did not adversely affect inferences of the population bottleneck. Applied to data from a previous household transmission study of influenza A infection, we confirm the result that the majority of transmission events involve a small number of viruses, albeit with slightly looser bottlenecks being inferred, with between 1 and 13 particles transmitted in the majority of cases. While influenza A transmission involves a tight population bottleneck, the bottleneck is not so tight as to universally prevent the transmission of within-host viral diversity. IMPORTANCE Viral populations undergo a repeated cycle of within-host growth followed by transmission. Viral evolution is affected by each stage of this cycle. The number of viral particles transmitted from one host to another, known as the transmission bottleneck, is an important factor in determining how the evolutionary dynamics of the population play out, restricting the extent to which the evolved diversity of the population can be passed from one host to another. Previous study of viral sequence data has suggested that the transmission bottleneck size for influenza A transmission between human hosts is small. Reevaluating these data using a novel and improved method, we largely confirm this result, albeit that we infer a slightly higher bottleneck size in some cases, of between 1 and 13 virions. While a tight bottleneck operates in human influenza transmission, it is not extreme in nature; some diversity can be meaningfully retained between hosts.
Collapse
|
5
|
Incorporating information from markers in LD with test locus for detecting imprinting and maternal effects. Eur J Hum Genet 2020; 28:1087-1097. [PMID: 32080366 DOI: 10.1038/s41431-020-0590-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2019] [Revised: 11/26/2019] [Accepted: 02/04/2020] [Indexed: 11/08/2022] Open
Abstract
Numerous statistical methods have been developed to explore genomic imprinting and maternal effects by identifying parent-of-origin patterns in complex human diseases. However, because most of these methods only use available locus-specific genotype data, it is sometimes impossible for them to infer the distribution of parental origin of a variant allele, especially when some genotypes are missing. In this article, we propose a two-step approach, LIMEhap, to improve upon a recent partial likelihood inference method. In the first step, the distribution of the missing genotypes is inferred through the construction of haplotypes by using information from nearby loci. In the second step, a partial likelihood method is applied to the inferred data. To substantiate the validity of the proposed procedures, we simulated data in a genomic region of gene GPX1. The results show that, by borrowing genetic information from nearby loci, the power of the proposed method can be close to that with complete genotype data at the locus of interest. Since the inference on the genotype distribution is made under the assumption of Hardy-Weinberg Equilibrium (HWE), we further studied the robustness of LIMEhap to violation of HWE. Finally, we demonstrate the utility of LIMEhap by applying it to an autism dataset.
Collapse
|
6
|
Datta AS, Lin S, Biswas S. A Family-Based Rare Haplotype Association Method for Quantitative Traits. Hum Hered 2019; 83:175-195. [PMID: 30799419 DOI: 10.1159/000493543] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2018] [Accepted: 09/07/2018] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND The variants identified in genome-wide association studies account for only a small fraction of disease heritability. A key to this "missing heritability" is believed to be rare variants. Specifically, we focus on rare haplotype variant (rHTV). The existing methods for detecting rHTV are mostly population-based, and as such, are susceptible to population stratification and admixture, leading to an inflated false-positive rate. Family-based methods are more robust in this respect. METHODS We propose a method for detecting rHTVs associated with quantitative traits called family-based quantitative Bayesian LASSO (famQBL). FamQBL can analyze any type of pedigree and is based on a mixed model framework. We regularize the haplotype effects using Bayesian LASSO and estimate the posterior distributions using Markov chain Monte Carlo methods. RESULTS We conduct simulation studies, including analyses of Genetic Analysis Workshop 18 simulated data, to study the properties of famQBL and compare with a standard family-based haplotype association test implemented in FBAT (family-based association test) software. We find famQBL to be more powerful than FBAT with well-controlled false-positive rates. We also apply famQBL to the Framingham Heart Study data and detect an rHTV associated with diastolic blood pressure. CONCLUSION FamQBL can help uncover rHTVs associated with quantitative traits.
Collapse
Affiliation(s)
- Ananda S Datta
- Department of Mathematical Sciences, University of Texas at Dallas, Richardson, Texas, USA
| | - Shili Lin
- Department of Statistics, The Ohio State University, Columbus, Ohio, USA
| | - Swati Biswas
- Department of Mathematical Sciences, University of Texas at Dallas, Richardson, Texas, USA,
| |
Collapse
|
7
|
Chang TJ, Wang WC, Hsiung CA, He CT, Lin MW, Sheu WHH, Chang YC, Quertermous T, Chen YDI, Rotter JI, Chuang LM. Genetic variation of SORBS1 gene is associated with glucose homeostasis and age at onset of diabetes: A SAPPHIRe Cohort Study. Sci Rep 2018; 8:10574. [PMID: 30002559 PMCID: PMC6043583 DOI: 10.1038/s41598-018-28891-z] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2017] [Accepted: 06/19/2018] [Indexed: 12/22/2022] Open
Abstract
The SORBS1 gene plays an important role in insulin signaling. We aimed to examine whether common single-nucleotide polymorphisms (SNPs) of SORBS1 are associated with prevalence and incidence of diabetes, age at onset of diabetes, and the related traits of glucose homeostasis. A total of 1135 siblings from 492 ethnic Chinese families were recruited at baseline, and 630 were followed up for 5.19 ± 0.96 years. Nine SNPs including rs7081076, rs2281939, rs3818540, rs2274490, rs61739184, rs726176, rs2296966, rs17849148, and rs3193970 were genotyped and examined. To deal with correlated data of subjects within the same families, the generalized estimating equations approach was applied throughout all association analyses. The GG genotype of rs2281939 was associated with a higher risk of diabetes at baseline, an earlier onset of diabetes, and higher steady-state plasma glucose levels in the modified insulin suppression test. The minor allele T of rs2296966 was associated with higher prevalence and incidence of diabetes, an earlier onset of diabetes, and higher 2-h glucose during oral glucose tolerance test. These two SNPs revealed independent associations with age of diabetes onset as well as risk of diabetes at baseline. These findings supported that SORBS1 gene participates in the pathogenesis of diabetes.
Collapse
Affiliation(s)
- Tien-Jyun Chang
- Department of Internal Medicine, National Taiwan University Hospital, Taipei, Taiwan
| | - Wen-Chang Wang
- The Ph.D. Program for Translational Medicine, College of Medical Science and Technology, Taipei Medical University, Taipei, Taiwan
- Division of Biostatistics and Bioinformatics, Institute of Population Health Sciences, National Health Research Institutes, Zhunan, Taiwan
| | - Chao A Hsiung
- Division of Biostatistics and Bioinformatics, Institute of Population Health Sciences, National Health Research Institutes, Zhunan, Taiwan
| | - Chih-Tsueng He
- Department of Endocrinology and Metabolism, Tri-Service General Hospital, Taipei, Taiwan
| | - Ming-Wei Lin
- Institute of Public Health, National Yang-Ming University, Taipei, Taiwan
- Department of Medical Research & Education, Taipei Veterans General Hospital, Taipei, Taiwan
| | - Wayne Huey-Herng Sheu
- Department of Endocrinology and Metabolism, Taichung Veterans General Hospital, Taichung, Taiwan
- School of Medicine, National Yang-Ming University, Taipei, Taiwan
- School of Medicine, National Defense Medical Center, Taipei, Taiwan
| | - Yi-Cheng Chang
- Department of Internal Medicine, National Taiwan University Hospital, Taipei, Taiwan
- Graduate Institute of Medical Genomics and Proteomics, National Taiwan University Medical College, Taipei, Taiwan
- Institute of Biomedical Science, Academia Sinica, Taipei, Taiwan
| | - Tom Quertermous
- Division of Cardiovascular Medicine, Falk CVRC, Stanford University School of Medicine, Stanford, CA, USA
| | - Yii-Der Ida Chen
- Institute for Translational Genomics and Population Sciences, Los Angeles Biomedical Research Institute at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Jerome I Rotter
- Institute for Translational Genomics and Population Sciences, Los Angeles Biomedical Research Institute at Harbor-UCLA Medical Center, Torrance, CA, USA
- Division of Genomic Outcomes, Departments of Pediatrics and Medicine, Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Lee-Ming Chuang
- Department of Internal Medicine, National Taiwan University Hospital, Taipei, Taiwan.
- Institute of Epidemiology and Preventive Medicine, College of Public Health, National Taiwan University, Taipei, Taiwan.
| |
Collapse
|
8
|
Bountouvi E, Papadopoulou A, Vanier MT, Nyktari G, Kanellakis S, Michelakakis H, Dinopoulos A. Novel NPC1 mutations with different segregation in two related Greek patients with Niemann-Pick type C disease: molecular study in the extended pedigree and clinical correlations. BMC MEDICAL GENETICS 2017; 18:51. [PMID: 28472934 PMCID: PMC5415950 DOI: 10.1186/s12881-017-0409-4] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/12/2016] [Accepted: 04/19/2017] [Indexed: 01/27/2023]
Abstract
BACKGROUND Niemann-Pick type C disease (NPC) is an autosomal recessive, neurovisceral, lysosomal storage disorder with protean and progressive clinical manifestations, resulting from mutations in either of the two genes, NPC1 (~95% of families) and NPC2. Contrary to other populations, published evidence regarding NPC disease in Greece is sparse. METHODS The study population consisted of two Greek NPC patients and their extended pedigree. Patients' clinical, biochemical, molecular profiles and the possible correlations are presented. Genotyping was performed by direct sequencing. Mutations' origin was investigated through selected exonic NPC1 polymorphisms encountered more frequently in a group of 37 Greek patients with clinical suspicion of NPC disease and in a group of 90 healthy Greek individuals, by the use of Haplore software. RESULTS Two novel NPC1 mutations, [IVS23 + 3insT (c.3591 + 3insT) and p. K1057R (c.3170A > C)] were identified and each mutation was associated with a specific haplotype. One of the patients was entered to early treatment with miglustat and has presented no overt neurological impairment after 11.5 years. CONCLUSIONS The splicing mutation IVS23 + 3insT was associated in homozygocity with a severe biochemical and clinical phenotype. A possible founder effect for this mutation was demonstrated in the Greek Island, as well as a different origin for each novel mutation. Longitudinal follow-up may contribute to clarify the possible effect of early miglustat therapy on the patient compound heterozygous for the two novel mutations.
Collapse
Affiliation(s)
- Evangelia Bountouvi
- Third Department of Pediatrics, Athens University Medical School, University General Hospital "Attikon", 1 Rimini Str, 12464 -Haidari, Athens, Greece
| | - Anna Papadopoulou
- Third Department of Pediatrics, Athens University Medical School, University General Hospital "Attikon", 1 Rimini Str, 12464 -Haidari, Athens, Greece.
| | - Marie T Vanier
- Laboratoire Gillet-Mérieux, Groupe Hospitalier Est, Hospices Civils de Lyon, Lyon, France
| | - Georgia Nyktari
- Third Department of Pediatrics, Athens University Medical School, University General Hospital "Attikon", 1 Rimini Str, 12464 -Haidari, Athens, Greece
| | - Spyridon Kanellakis
- Department of Nutrition and Dietetics, Harokopio University, Kallithea, Athens, Greece
| | - Helen Michelakakis
- Department of Enzymology and Cellular Function, Institute of Child Health, Athens, Greece
| | - Argyrios Dinopoulos
- Third Department of Pediatrics, Athens University Medical School, University General Hospital "Attikon", 1 Rimini Str, 12464 -Haidari, Athens, Greece
| |
Collapse
|
9
|
Lin WY, Liang YC. Conditioning adaptive combination of P-values method to analyze case-parent trios with or without population controls. Sci Rep 2016; 6:28389. [PMID: 27341039 PMCID: PMC4920030 DOI: 10.1038/srep28389] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2016] [Accepted: 06/02/2016] [Indexed: 11/24/2022] Open
Abstract
Detection of rare causal variants can help uncover the etiology of complex diseases. Recruiting case-parent trios is a popular study design in family-based studies. If researchers can obtain data from population controls, utilizing them in trio analyses can improve the power of methods. The transmission disequilibrium test (TDT) is a well-known method to analyze case-parent trio data. It has been extended to rare-variant association testing (abbreviated as "rvTDT"), with the flexibility to incorporate population controls. The rvTDT method is robust to population stratification. However, power loss may occur in the conditioning process. Here we propose a "conditioning adaptive combination of P-values method" (abbreviated as "conADA"), to analyze trios with/without unrelated controls. By first truncating the variants with larger P-values, we decrease the vulnerability of conADA to the inclusion of neutral variants. Moreover, because the test statistic is developed by conditioning on parental genotypes, conADA generates valid statistical inference in the presence of population stratification. With regard to statistical methods for next-generation sequencing data analyses, validity may be hampered by population stratification, whereas power may be affected by the inclusion of neutral variants. We recommend conADA for its robustness to these two factors (population stratification and the inclusion of neutral variants).
Collapse
Affiliation(s)
- Wan-Yu Lin
- Institute of Epidemiology and Preventive Medicine, College of Public Health, National Taiwan University, Taipei, Taiwan
- Department of Public Health, College of Public Health, National Taiwan University, Taipei, Taiwan
| | - Yun-Chieh Liang
- Institute of Epidemiology and Preventive Medicine, College of Public Health, National Taiwan University, Taipei, Taiwan
| |
Collapse
|
10
|
Lee YC, Tsai PC, Guo YC, Hsiao CT, Liu GT, Liao YC, Soong BW. Spinocerebellar ataxia type 36 in the Han Chinese. NEUROLOGY-GENETICS 2016; 2:e68. [PMID: 27123487 PMCID: PMC4830187 DOI: 10.1212/nxg.0000000000000068] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/04/2015] [Accepted: 03/01/2016] [Indexed: 12/27/2022]
Abstract
Objective: To ascertain the genetic and clinical characteristics of the GGCCTG hexanucleotide repeat expansion in the nucleolar protein 56 gene (NOP56) in patients with spinocerebellar ataxia (SCA), sporadic ataxia, or amyotrophic lateral sclerosis (ALS) in Taiwan. Methods: We conducted clinical and molecular genetic studies of 109 probands with molecularly unassigned SCA from 512 SCA pedigrees, 323 healthy controls, 502 patients with sporadic ataxia syndromes, and 144 patients with ALS. Repeat-primed PCR assays and PCR-fragment analysis for the number of short hexanucleotide repeats (<40 units) were performed to ascertain NOP56 hexanucleotide repeat expansion. Genotyping included 8 microsatellite markers and 17 single nucleotide polymorphisms flanking NOP56 and covering a region of 1.8 Mb to assess a possible founder effect. Results: Eleven individuals from 3 SCA pedigrees have the NOP56 repeat expansions. The 3 pedigrees share a common haplotype spanning 5.3 kb flanking the NOP56 repeat expansions, suggesting a founder effect of spinocerebellar ataxia type 36 (SCA36) in the Han Chinese. The average age at symptom onset was 44.8 ± 3.8 years with truncal ataxia as the initial manifestation. Common features included slowly progressive truncal/limb ataxia, dysarthria, generalized hyperreflexia, and hearing impairment. Evidence of lower motor neuron involvement, including atrophy and fasciculation in the limb muscles and tongue, was mostly found in patients with prolonged disease duration. NOP56 repeat expansion was not detected in controls or patients with sporadic ataxic syndromes or ALS. Conclusions: SCA36 is an uncommon subtype, which accounted for 0.6% (3/512) of SCA cases in the Han Chinese population.
Collapse
Affiliation(s)
- Yi-Chung Lee
- Department of Neurology (Y.-C. Lee, C.-T.H., G.-T.L., Y.-C. Liao, B.-W.S.), Taipei Veterans General Hospital, Taiwan; Department of Neurology (Y.-C. Lee, P.-C.T., Y.-C. Liao, B.-W.S.), Institute of Clinical Medicine (Y.-C.G.), and Brain Research Center (Y.-C. Lee, P.-C.T., B.-W.S.), National Yang-Ming University School of Medicine, Taipei, Taiwan; Department of Neurology (Y.-C.G.), and School of Medicine (Y.-C.G.), College of Medicine, China Medical University, Taichung, Taiwan
| | - Pei-Chien Tsai
- Department of Neurology (Y.-C. Lee, C.-T.H., G.-T.L., Y.-C. Liao, B.-W.S.), Taipei Veterans General Hospital, Taiwan; Department of Neurology (Y.-C. Lee, P.-C.T., Y.-C. Liao, B.-W.S.), Institute of Clinical Medicine (Y.-C.G.), and Brain Research Center (Y.-C. Lee, P.-C.T., B.-W.S.), National Yang-Ming University School of Medicine, Taipei, Taiwan; Department of Neurology (Y.-C.G.), and School of Medicine (Y.-C.G.), College of Medicine, China Medical University, Taichung, Taiwan
| | - Yuh-Cherng Guo
- Department of Neurology (Y.-C. Lee, C.-T.H., G.-T.L., Y.-C. Liao, B.-W.S.), Taipei Veterans General Hospital, Taiwan; Department of Neurology (Y.-C. Lee, P.-C.T., Y.-C. Liao, B.-W.S.), Institute of Clinical Medicine (Y.-C.G.), and Brain Research Center (Y.-C. Lee, P.-C.T., B.-W.S.), National Yang-Ming University School of Medicine, Taipei, Taiwan; Department of Neurology (Y.-C.G.), and School of Medicine (Y.-C.G.), College of Medicine, China Medical University, Taichung, Taiwan
| | - Cheng-Tsung Hsiao
- Department of Neurology (Y.-C. Lee, C.-T.H., G.-T.L., Y.-C. Liao, B.-W.S.), Taipei Veterans General Hospital, Taiwan; Department of Neurology (Y.-C. Lee, P.-C.T., Y.-C. Liao, B.-W.S.), Institute of Clinical Medicine (Y.-C.G.), and Brain Research Center (Y.-C. Lee, P.-C.T., B.-W.S.), National Yang-Ming University School of Medicine, Taipei, Taiwan; Department of Neurology (Y.-C.G.), and School of Medicine (Y.-C.G.), College of Medicine, China Medical University, Taichung, Taiwan
| | - Guan-Ting Liu
- Department of Neurology (Y.-C. Lee, C.-T.H., G.-T.L., Y.-C. Liao, B.-W.S.), Taipei Veterans General Hospital, Taiwan; Department of Neurology (Y.-C. Lee, P.-C.T., Y.-C. Liao, B.-W.S.), Institute of Clinical Medicine (Y.-C.G.), and Brain Research Center (Y.-C. Lee, P.-C.T., B.-W.S.), National Yang-Ming University School of Medicine, Taipei, Taiwan; Department of Neurology (Y.-C.G.), and School of Medicine (Y.-C.G.), College of Medicine, China Medical University, Taichung, Taiwan
| | - Yi-Chu Liao
- Department of Neurology (Y.-C. Lee, C.-T.H., G.-T.L., Y.-C. Liao, B.-W.S.), Taipei Veterans General Hospital, Taiwan; Department of Neurology (Y.-C. Lee, P.-C.T., Y.-C. Liao, B.-W.S.), Institute of Clinical Medicine (Y.-C.G.), and Brain Research Center (Y.-C. Lee, P.-C.T., B.-W.S.), National Yang-Ming University School of Medicine, Taipei, Taiwan; Department of Neurology (Y.-C.G.), and School of Medicine (Y.-C.G.), College of Medicine, China Medical University, Taichung, Taiwan
| | - Bing-Wen Soong
- Department of Neurology (Y.-C. Lee, C.-T.H., G.-T.L., Y.-C. Liao, B.-W.S.), Taipei Veterans General Hospital, Taiwan; Department of Neurology (Y.-C. Lee, P.-C.T., Y.-C. Liao, B.-W.S.), Institute of Clinical Medicine (Y.-C.G.), and Brain Research Center (Y.-C. Lee, P.-C.T., B.-W.S.), National Yang-Ming University School of Medicine, Taipei, Taiwan; Department of Neurology (Y.-C.G.), and School of Medicine (Y.-C.G.), College of Medicine, China Medical University, Taichung, Taiwan
| |
Collapse
|
11
|
Howey R, Mamasoula C, Töpf A, Nudel R, Goodship J, Keavney B, Cordell H. Increased Power for Detection of Parent-of-Origin Effects via the Use of Haplotype Estimation. Am J Hum Genet 2015; 97:419-34. [PMID: 26320892 PMCID: PMC4564992 DOI: 10.1016/j.ajhg.2015.07.016] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2015] [Accepted: 07/29/2015] [Indexed: 01/02/2023] Open
Abstract
Parent-of-origin (or imprinting) effects relate to the situation in which traits are influenced by the allele inherited from only one parent and the allele from the other parent has little or no effect. Given SNP genotype data from case-parent trios, the parent of origin of each allele in the offspring can often be deduced unambiguously; however, this is not true when all three individuals are heterozygous. Most existing methods for investigating parent-of-origin effects operate on a SNP-by-SNP basis and either perform some sort of averaging over the possible parental transmissions or else discard ambiguous trios. If the correct parent of origin at a SNP could be determined, this would provide extra information and increase the power for detecting the effects of imprinting. We propose making use of the surrounding SNP information, via haplotype estimation, to improve estimation of parent of origin at a test SNP for case-parent trios, case-mother duos, and case-father duos. This extra information is then used in a multinomial modeling approach for estimating parent-of-origin effects at the test SNP. We show through computer simulations that our approach has increased power over previous approaches, particularly when the data consist only of duos. We apply our method to two real datasets and find a decrease in significance of p values in genomic regions previously thought to possibly harbor imprinting effects, thus weakening the evidence that such effects actually exist in these regions, although some regions retain evidence of significant effects.
Collapse
|
12
|
Li W, Fu G, Rao W, Xu W, Ma L, Guo S, Song Q. GenomeLaser: fast and accurate haplotyping from pedigree genotypes. Bioinformatics 2015; 31:3984-7. [PMID: 26286810 DOI: 10.1093/bioinformatics/btv452] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2015] [Accepted: 07/28/2015] [Indexed: 01/12/2023] Open
Abstract
UNLABELLED We present a software tool called GenomeLaser that determines the haplotypes of each person from unphased high-throughput genotypes in family pedigrees. This method features high accuracy, chromosome-range phasing distance, linear computing, flexible pedigree types and flexible genetic marker types. AVAILABILITY AND IMPLEMENTATION http://www.4dgenome.com/software/genomelaser.html.
Collapse
Affiliation(s)
- Wenzhi Li
- Department of Neurosurgery, First Affiliated Hospital of Medical School, Xi'an Jiaotong University, Xi'an, Shaanxi, 710061 China, Cardiovascular Research Institute and Department of Medicine, Morehouse School of Medicine, Atlanta, GA, 30310 USA
| | - Guoxing Fu
- 4DGENOME Inc, Atlanta, GA, 30033 USA and
| | | | - Wei Xu
- Cardiovascular Research Institute and Department of Medicine, Morehouse School of Medicine, Atlanta, GA, 30310 USA
| | - Li Ma
- Cardiovascular Research Institute and Department of Medicine, Morehouse School of Medicine, Atlanta, GA, 30310 USA, 4DGENOME Inc, Atlanta, GA, 30033 USA and
| | - Shiwen Guo
- Department of Neurosurgery, First Affiliated Hospital of Medical School, Xi'an Jiaotong University, Xi'an, Shaanxi, 710061 China
| | - Qing Song
- Cardiovascular Research Institute and Department of Medicine, Morehouse School of Medicine, Atlanta, GA, 30310 USA, 4DGENOME Inc, Atlanta, GA, 30033 USA and Center of Big Data and Bioinformatics, First Affiliated Hospital of Medical School, Xi'an Jiaotong University, Xi'an, Shaanxi, 710061 China
| |
Collapse
|
13
|
Wu J, Chen GB, Zhi D, Liu N, Zhang K. A hidden Markov model for haplotype inference for present-absent data of clustered genes using identified haplotypes and haplotype patterns. Front Genet 2014; 5:267. [PMID: 25161663 PMCID: PMC4129397 DOI: 10.3389/fgene.2014.00267] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2014] [Accepted: 07/21/2014] [Indexed: 11/21/2022] Open
Abstract
The majority of killer cell immunoglobin-like receptor (KIR) genes are detected as either present or absent using locus-specific genotyping technology. Ambiguity arises from the presence of a specific KIR gene since the exact copy number (one or two) of that gene is unknown. Therefore, haplotype inference for these genes is becoming more challenging due to such large portion of missing information. Meantime, many haplotypes and partial haplotype patterns have been previously identified due to tight linkage disequilibrium (LD) among these clustered genes thus can be incorporated to facilitate haplotype inference. In this paper, we developed a hidden Markov model (HMM) based method that can incorporate identified haplotypes or partial haplotype patterns for haplotype inference from present-absent data of clustered genes (e.g., KIR genes). We compared its performance with an expectation maximization (EM) based method previously developed in terms of haplotype assignments and haplotype frequency estimation through extensive simulations for KIR genes. The simulation results showed that the new HMM based method outperformed the previous method when some incorrect haplotypes were included as identified haplotypes and/or the standard deviation of haplotype frequencies were small. We also compared the performance of our method with two methods that do not use previously identified haplotypes and haplotype patterns, including an EM based method, HPALORE, and a HMM based method, MaCH. Our simulation results showed that the incorporation of identified haplotypes and partial haplotype patterns can improve accuracy for haplotype inference. The new software package HaploHMM is available and can be downloaded at http://www.soph.uab.edu/ssg/files/People/KZhang/HaploHMM/haplohmm-index.html.
Collapse
Affiliation(s)
- Jihua Wu
- Section on Statistical Genetics, Department of Biostatistics, University of Alabama at Birmingham Birmingham, AL, USA
| | - Guo-Bo Chen
- Section on Statistical Genetics, Department of Biostatistics, University of Alabama at Birmingham Birmingham, AL, USA ; Queensland Brain Institute, The University of Queensland St. Lucia, QLD, Australia
| | - Degui Zhi
- Section on Statistical Genetics, Department of Biostatistics, University of Alabama at Birmingham Birmingham, AL, USA
| | - Nianjun Liu
- Section on Statistical Genetics, Department of Biostatistics, University of Alabama at Birmingham Birmingham, AL, USA
| | - Kui Zhang
- Section on Statistical Genetics, Department of Biostatistics, University of Alabama at Birmingham Birmingham, AL, USA
| |
Collapse
|
14
|
HapTree: a novel Bayesian framework for single individual polyplotyping using NGS data. PLoS Comput Biol 2014; 10:e1003502. [PMID: 24675685 PMCID: PMC3967924 DOI: 10.1371/journal.pcbi.1003502] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2013] [Accepted: 01/14/2014] [Indexed: 01/30/2023] Open
Abstract
As the more recent next-generation sequencing (NGS) technologies provide longer read sequences, the use of sequencing datasets for complete haplotype phasing is fast becoming a reality, allowing haplotype reconstruction of a single sequenced genome. Nearly all previous haplotype reconstruction studies have focused on diploid genomes and are rarely scalable to genomes with higher ploidy. Yet computational investigations into polyploid genomes carry great importance, impacting plant, yeast and fish genomics, as well as the studies of the evolution of modern-day eukaryotes and (epi)genetic interactions between copies of genes. In this paper, we describe a novel maximum-likelihood estimation framework, HapTree, for polyploid haplotype assembly of an individual genome using NGS read datasets. We evaluate the performance of HapTree on simulated polyploid sequencing read data modeled after Illumina sequencing technologies. For triploid and higher ploidy genomes, we demonstrate that HapTree substantially improves haplotype assembly accuracy and efficiency over the state-of-the-art; moreover, HapTree is the first scalable polyplotyping method for higher ploidy. As a proof of concept, we also test our method on real sequencing data from NA12878 (1000 Genomes Project) and evaluate the quality of assembled haplotypes with respect to trio-based diplotype annotation as the ground truth. The results indicate that HapTree significantly improves the switch accuracy within phased haplotype blocks as compared to existing haplotype assembly methods, while producing comparable minimum error correction (MEC) values. A summary of this paper appears in the proceedings of the RECOMB 2014 conference, April 2–5. While human and other eukaryotic genomes typically contain two copies of every chromosome, plants, yeast and fish such as salmon can have strictly more than two copies of each chromosome. By running standard genotype calling tools, it is possible to accurately identify the number of “wild type” and “mutant” alleles (A, C, G, or T) for each single-nucleotide polymorphism (SNP) site. However, in the case of two heterozygous SNP sites, genotype calling tools cannot determine whether “mutant” alleles from different SNP loci are on the same or different chromosomes. While the former would be healthy, in many cases the latter can cause loss of function; it is therefore necessary to identify the phase—the copies of a chromosome on which the mutant alleles occur—in addition to the genotype. This necessitates efficient algorithms to obtain accurate and comprehensive phase information directly from the next-generation-sequencing read data in higher ploidy species. We introduce an efficient statistical method for this task and show that our method significantly outperforms previous ones, in both accuracy and speed, for phasing triploid and higher ploidy genomes. Our method performs well on human diploid genomes as well, as demonstrated by our improved phasing of the well known NA12878 (1000 Genomes Project).
Collapse
|
15
|
Aissani B, Wiener HW, Zhang K, Kaslow RA, Ogwaro KM, Shrestha S, Jacobson LP. A candidate gene approach for virally induced cancer with application to HIV-related Kaposi's sarcoma. Int J Cancer 2014; 134:397-404. [PMID: 23818101 PMCID: PMC4007164 DOI: 10.1002/ijc.28351] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2013] [Accepted: 06/14/2013] [Indexed: 11/07/2022]
Abstract
Like other members of the γ-herpesvirus family, human herpes virus 8, the etiologic agent of classic and HIV-related Kaposi's sarcoma (HIV-KS) acquired and evolved several human genes with key immune modulatory and cellular growth control functions. The encoded viral homologs substitute for their human counterparts but escape cellular regulation, leading to uncontrolled cell proliferation. We postulated that DNA variants in the human homologs of viral genes that potentially alter the expression or the binding of the encoded factors controlling the antiviral response may facilitate viral interference. To test whether cellular homologs are candidate susceptibility genes, we evaluated the association of DNA variants in 92 immune-related genes including seven cellular homologs with the risk for HIV-KS in a matched case and control study nested in the Multicenter AIDS Cohort Study. Low- and high-risk gene-by-gene interactions were estimated by multifactor dimensionality reduction and used as predictors in conditional logistic models. Among the most significant gene interactions at risk (OR=2.84-3.92; Bonferroni- adjusted p=9.9 × 10(-3) - 2.6 × 10(-4) ), three comprised human homologs of two latently expressed viral genes, cyclin D1 (CCND1) and interleukin-6 (IL-6), in conjunction with angiogenic genes (VEGF, EDN-1 and EDNRB). At lower significance thresholds (adjusted p < 0.05), human homologs related to apoptosis (CFLAR) and chemotaxis (CCL2) emerged as candidates. This "proof of concept" study identified human homologs involved in the regulation of type I interferon-induced signaling, cell cycle and apoptosis potentially as important determinants of HIV-KS.
Collapse
Affiliation(s)
- Brahim Aissani
- Department of Epidemiology, University of Alabama at Birmingham, Birmingham, Alabama 35294, USA
| | - Howard W. Wiener
- Department of Epidemiology, University of Alabama at Birmingham, Birmingham, Alabama 35294, USA
| | - Kui Zhang
- Department of Biostatistics, University of Alabama at Birmingham, Birmingham, Alabama 35294, USA
| | - Richard A. Kaslow
- Department of Epidemiology, University of Alabama at Birmingham, Birmingham, Alabama 35294, USA
- Department of Medicine, University of Alabama at Birmingham, Birmingham, Alabama 35294, USA
| | - Kisani M. Ogwaro
- Department of Psychiatry, University of Arizona School of Medicine, Tucson, Arizona 85724, USA
| | - Sadeep Shrestha
- Department of Epidemiology, University of Alabama at Birmingham, Birmingham, Alabama 35294, USA
| | - Lisa P. Jacobson
- Department of Epidemiology, Johns Hopkins University, Baltimore, MD 21205, USA
| |
Collapse
|
16
|
Wang C, Habier D, Peiris BL, Wolc A, Kranis A, Watson KA, Avendano S, Garrick DJ, Fernando RL, Lamont SJ, Dekkers JCM. Accuracy of genomic prediction using an evenly spaced, low-density single nucleotide polymorphism panel in broiler chickens. Poult Sci 2013; 92:1712-23. [PMID: 23776257 DOI: 10.3382/ps.2012-02941] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open
Abstract
One approach for cost-effective implementation of genomic selection is to genotype training individuals with a high-density (HD) panel and selection candidates with an evenly spaced, low-density (ELD) panel. The purpose of this study was to evaluate the extent to which the ELD approach reduces the accuracy of genomic estimated breeding values (GEBV) in a broiler line, in which 1,091 breeders from 3 generations were used for training and 160 progeny of the third generation for validation. All birds were genotyped with an Illumina Infinium platform HD panel that included 20,541 segregating markers. Two subsets of HD markers, with 377 (ELD-1) or 766 (ELD-2) markers, were selected as ELD panels. The ELD-1 panel was genotyped using KBiosciences KASPar SNP genotyping chemistry, whereas the ELD-2 panel was simulated by adding markers from the HD panel to the ELD-1 panel. The training data set was used for 2 traits: BW at 35 d on both sexes and hen house production (HHP) between wk 28 and 54. Methods Bayes-A, -B, -C and genomic best linear unbiased prediction were used to estimate HD-marker effects. Two scenarios were used: (1) the 160 progeny were ELD-genotyped, and (2) the 160 progeny and their dams (117 birds) were ELD-genotyped. The missing HD genotypes in ELD-genotyped birds were imputed by a Gibbs sampler, capitalizing on linkage within families. In scenario (1), the correlation of GEBV for BW (HHP) of the 160 progeny based on observed HD versus imputed genotypes was greater than 0.94 (0.98) with the ELD-1 panel and greater than 0.97 (0.99) with the ELD-2 panel. In scenario (2), the correlation of GEBV for BW (HHP) was greater than 0.92 (0.96) with the ELD-1 panel and greater than 0.95 (0.98) with the ELD-2 panel. Hence, in a pedigreed population, genomic selection can be implemented by genotyping selection candidates with about 400 ELD markers with less than 6% loss in accuracy. This leads to substantial savings in genotyping costs, with little sacrifice in accuracy.
Collapse
Affiliation(s)
- C Wang
- Department of Animal Science, Iowa State University, Ames, IA, USA
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
17
|
Merino AM, Zhang K, Kaslow RA, Aissani B. Structure of tumor necrosis factor-alpha haploblocks in European populations. Immunogenetics 2013; 65:543-52. [PMID: 23579626 PMCID: PMC3985396 DOI: 10.1007/s00251-013-0700-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2013] [Accepted: 03/23/2013] [Indexed: 10/27/2022]
Abstract
DNA variants in the tumor necrosis factor-α (TNF) and linked lymphotoxin-α genes, and specific alleles of the highly polymorphic human leukocyte antigen B (HLA-B) gene have been implicated in a plethora of immune and infectious diseases. However, the tight linkage disequilibrium characterizing the central region of the human major histocompatibility complex (MHC) containing these gene loci has made difficult the unequivocal interpretation of genetic association data. To alleviate these difficulties and facilitate the design of more focused follow-up studies, we investigated the structure and distribution of HLA-B-specific MHC haplotypes reconstructed in a European population from unphased genotypes at a set of 25 single nucleotide polymorphism sites spanning a 66-kilobase long region across TNF. Consistent with the published data, we found limited genetic diversity across the so-called TNF block, with the emergence of seven common MHC haplotypes, termed TNF block super-haplotypes. We also found that the ancestral haplotype 8.1 shares a TNF block haplotype with HLA-B*4402. HLA-B*5701, a known protective allele in HIV-1 pathogenesis, occurred in a unique TNF block haplotype.
Collapse
Affiliation(s)
| | - Kui Zhang
- Department of Biostatistics, University of Alabama at Birmingham
| | - Richard A. Kaslow
- Department of Epidemiology, University of Alabama at Birmingham
- Department of Medicine, University of Alabama at Birmingham
| | - Brahim Aissani
- Department of Epidemiology, University of Alabama at Birmingham
| |
Collapse
|
18
|
Sabaa H, Cai Z, Wang Y, Goebel R, Moore S, Lin G. Whole genome identity-by-descent determination. J Bioinform Comput Biol 2013; 11:1350002. [PMID: 23600820 DOI: 10.1142/s0219720013500029] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
High-throughput single nucleotide polymorphism genotyping assays conveniently produce genotype data for genome-wide genetic linkage and association studies. For pedigree datasets, the unphased genotype data is used to infer the haplotypes for individuals, according to Mendelian inheritance rules. Linkage studies can then locate putative chromosomal regions based on the haplotype allele sharing among the pedigree members and their disease status. Most existing haplotyping programs require rather strict pedigree structures and return a single inferred solution for downstream analysis. In this research, we relax the pedigree structure to contain ungenotyped founders and present a cubic time whole genome haplotyping algorithm to minimize the number of zero-recombination haplotype blocks. With or without explicitly enumerating all the haplotyping solutions, the algorithm determines all distinct haplotype allele identity-by-descent (IBD) sharings among the pedigree members, in linear time in the total number of haplotyping solutions. Our algorithm is implemented as a computer program iBDD. Extensive simulation experiments using 2 sets of 16 pedigree structures from previous studies showed that, in general, there are trillions of haplotyping solutions, but only up to a few thousand distinct haplotype allele IBD sharings. iBDD is able to return all these sharings for downstream genome-wide linkage and association studies.
Collapse
Affiliation(s)
- Hadi Sabaa
- Department of Computing Science, University of Alberta, Edmonton, Alberta T6G 2E8, Canada.
| | | | | | | | | | | |
Collapse
|
19
|
Lai EY, Wang WB, Jiang T, Wu KP. A linear-time algorithm for reconstructing zero-recombinant haplotype configuration on a pedigree. BMC Bioinformatics 2012; 13 Suppl 17:S19. [PMID: 23281626 PMCID: PMC3521470 DOI: 10.1186/1471-2105-13-s17-s19] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Background When studying genetic diseases in which genetic variations are passed on to offspring, the ability to distinguish between paternal and maternal alleles is essential. Determining haplotypes from genotype data is called haplotype inference. Most existing computational algorithms for haplotype inference have been designed to use genotype data collected from individuals in the form of a pedigree. A haplotype is regarded as a hereditary unit and therefore input pedigrees are preferred that are free of mutational events and have a minimum number of genetic recombinational events. These ideas motivated the zero-recombinant haplotype configuration (ZRHC) problem, which strictly follows the Mendelian law of inheritance, namely that one haplotype of each child is inherited from the father and the other haplotype is inherited from the mother, both without any mutation. So far no linear-time algorithm for ZRHC has been proposed for general pedigrees, even though the number of mating loops in a human pedigree is usually very small and can be regarded as constant. Results Given a pedigree with n individuals, m marker loci, and k mating loops, we proposed an algorithm that can provide a general solution to the zero-recombinant haplotype configuration problem in O(kmn + k2m) time. In addition, this algorithm can be modified to detect inconsistencies within the genotype data without loss of efficiency. The proposed algorithm was subject to 12000 experiments to verify its performance using different (n, m) combinations. The value of k was uniformly distributed between zero and six throughout all experiments. The experimental results show a great linearity in terms of execution time in relation to input size when both n and m are larger than 100. For those experiments where n or m are less than 100, the proposed algorithm runs very fast, in thousandth to hundredth of a second, on a personal desktop computer. Conclusions We have developed the first deterministic linear-time algorithm for the zero-recombinant haplotype configuration problem. Our experimental results demonstrated the linearity of its execution time in relation to the input size. The proposed algorithm can be modified to detect inconsistency within the genotype data without loss of efficiency and is expected to be able to handle recombinant and missing data with further extension.
Collapse
Affiliation(s)
- En-Yu Lai
- Institute of Biomedical Informatics, National Yang Ming University, Taipei 112, Taiwan
| | | | | | | |
Collapse
|
20
|
Cui W, Wang L. Identifying mutation regions for closely related individuals without a known pedigree. BMC Bioinformatics 2012; 13:146. [PMID: 22731852 PMCID: PMC3507658 DOI: 10.1186/1471-2105-13-146] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2011] [Accepted: 06/07/2012] [Indexed: 01/08/2023] Open
Abstract
Background Linkage analysis is the first step in the search for a disease gene. Linkage studies have facilitated the identification of several hundred human genes that can harbor mutations leading to a disease phenotype. In this paper, we study a very important case, where the sampled individuals are closely related, but the pedigree is not given. This situation happens very often when the individuals share a common ancestor 6 or more generations ago. To our knowledge, no algorithm can give good results for this case. Results To solve this problem, we first developed some heuristic algorithms for haplotype inference without any given pedigree. We propose a model using the parsimony principle that can be viewed as an extension of the model first proposed by Dan Gusfield. Our heuristic algorithm uses Clark’s inference rule to infer haplotype segments. Conclusions We ran our program both on the simulated data and a set of real data from the phase II HapMap database. Experiments show that our program performs well. The recall value is from 90% to 99% in various cases. This implies that the program can report more than 90% of the true mutation regions. The value of precision varies from 29% to 90%. When the precision is 29%, the size of the reported regions is three times that of the true mutation region. This is still very useful for narrowing down the range of the disease gene location. Our program can complete the computation for all the tested cases, where there are about 110,000 SNPs on a chromosome, within 20 seconds.
Collapse
Affiliation(s)
- Wenjuan Cui
- Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong
| | | |
Collapse
|
21
|
Iliadis A, Anastassiou D, Wang X. A unified framework for haplotype inference in nuclear families. Ann Hum Genet 2012; 76:312-25. [PMID: 22607042 DOI: 10.1111/j.1469-1809.2012.00715.x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Many large genome-wide association studies include nuclear families with more than one child (trio families), allowing for analysis of differences between siblings (sib pair analysis). Statistical power can be increased when haplotypes are used instead of genotypes. Currently, haplotype inference in families with more than one child can be performed either using the familial information or statistical information derived from the population samples but not both. Building on our recently proposed tree-based deterministic framework (TDS) for trio families, we augment its applicability to general nuclear families. We impose a minimum recombinant approach locally and independently on each multiple children family, while resorting to the population-derived information to solve the remaining ambiguities. Thus our framework incorporates all available information (familial and population) in a given study. We demonstrate that using all the constraints in our approach we can have gains in the accuracy as opposed to breaking the multiple children families to separate trios and resorting to a trio inference algorithm or phasing each family in isolation. We believe that our proposed framework could be the method of choice for haplotype inference in studies that include nuclear families with multiple children. Our software (tds2.0) is downloadable from www.ee.columbia.edu/∼anastas/tds.
Collapse
Affiliation(s)
- Alexandros Iliadis
- Department of Electrical Engineering, Columbia University, New York, NY, USA
| | | | | |
Collapse
|
22
|
Li X, Li J. Haplotype inference. Methods Mol Biol 2012; 850:411-21. [PMID: 22307711 DOI: 10.1007/978-1-61779-555-8_22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/19/2023]
Abstract
Haplotypes, as they specify linkage patterns between individual nucleotide variants, confer critical information for understanding the genetics of human diseases. However, haplotype information is not directly obtainable from high-throughput genotyping platforms. In this chapter, we introduce two representative methods to reconstruct haplotypes from unphased genotype data, one method is for unrelated individuals and the other is for families.
Collapse
Affiliation(s)
- Xin Li
- Department of Electrical Engineering and Computer Science, Case Western Reserve University, Cleveland, OH, USA
| | | |
Collapse
|
23
|
Doan DD, Evans PA. Haplotype inference in general pedigrees with two sites. BMC Proc 2011; 5 Suppl 2:S6. [PMID: 21554764 PMCID: PMC3090764 DOI: 10.1186/1753-6561-5-s2-s6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
Background Genetic disease studies investigate relationships between changes in chromosomes and genetic diseases. Single haplotypes provide useful information for these studies but extracting single haplotypes directly by biochemical methods is expensive. A computational method to infer haplotypes from genotype data is therefore important. We investigate the problem of computing the minimum number of recombination events for general pedigrees with two sites for all members. Results We show that this NP-hard problem can be parametrically reduced to the Bipartization by Edge Removal problem and therefore can be solved by an O(2k · n2) exact algorithm, where n is the number of members and k is the number of recombination events. Conclusions Our work can therefore be useful for genetic disease studies to track down how changes in haplotypes such as recombinations relate to genetic disease.
Collapse
|
24
|
WANG WEIBUNG, JIANG TAO. INFERRING HAPLOTYPES FROM GENOTYPES ON A PEDIGREE WITH MUTATIONS, GENOTYPING ERRORS AND MISSING ALLELES. J Bioinform Comput Biol 2011; 9:339-65. [DOI: 10.1142/s0219720011005549] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2011] [Revised: 02/28/2011] [Accepted: 03/01/2011] [Indexed: 11/18/2022]
Abstract
Inferring the haplotypes of the members of a pedigree from their genotypes has been extensively studied. However, most studies do not consider genotyping errors and de novo mutations. In this paper, we study how to infer haplotypes from genotype data that may contain genotyping errors, de novo mutations, and missing alleles. We assume that there are no recombinants in the genotype data, which is usually true for tightly linked markers. We introduce a combinatorial optimization problem, called haplotype configuration with mutations and errors (HCME), which calls for haplotype configurations consistent with the given genotypes that incur no recombinants and require the minimum number of mutations and errors. HCME is NP-hard. To solve the problem, we propose a heuristic algorithm, the core of which is an integer linear program (ILP) using the system of linear equations over Galois field GF(2). Our algorithm can detect and locate genotyping errors that cannot be detected by simply checking the Mendelian law of inheritance. The algorithm also offers error correction in genotypes/haplotypes rather than just detecting inconsistencies and deleting the involved loci. Our experimental results show that the algorithm can infer haplotypes with a very high accuracy and recover 65%–94% of genotyping errors depending on the pedigree topology.
Collapse
Affiliation(s)
- WEI-BUNG WANG
- Computer Science, University of California - Riverside, 900 University Avenue, Riverside, California 92521, USA
| | - TAO JIANG
- Computer Science, University of California - Riverside, 900 University Avenue, Riverside, California 92521, USA
| |
Collapse
|
25
|
Ma W, Yang Y, Chen ZZ, Wang L. Mutation region detection for closely related individuals without a known pedigree using high-density genotype data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011; 9:499-510. [PMID: 22025760 DOI: 10.1109/tcbb.2011.134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
The fundamental problem in linkage analysis is to identify regions whose allele is shared by all or almost all affected members but by none or few unaffected members. Almost all the existing methods for linkage analysis are for families with clearly given pedigrees. Little work has been done for the case where the sampled individuals are closely related, but their pedigree is not known. This situation occurs very often when the individuals share a common ancestor at least six generations ago. Solving this case will tremendously extend the use of linkage analysis for finding genes that cause genetic diseases. In this paper, we propose a mathematical model (the shared center problem) for inferring the allele-sharing status of a given set of individuals using a database of confirmed haplotypes as reference. We show the NP-completeness of the shared center problem and present a ratio-2 polynomial-time approximation algorithm. We then convert the approximation algorithm into a heuristic algorithm for the shared center problem. Based on this heuristic, we finally design a heuristic algorithm for mutation region detection. We further implement the algorithms to obtain a software package. Our experimental data shows that the software works very well. The package is available at http://www.cs.cityu.edu.hk/~lwang/software/LDWP/index.html for non-commercial use.
Collapse
Affiliation(s)
- Wenji Ma
- City University of Hong Kong, Hong Kong
| | | | | | | |
Collapse
|
26
|
Abstract
Determination of haplotype phase is becoming increasingly important as we enter the era of large-scale sequencing because many of its applications, such as imputing low-frequency variants and characterizing the relationship between genetic variation and disease susceptibility, are particularly relevant to sequence data. Haplotype phase can be generated through laboratory-based experimental methods, or it can be estimated using computational approaches. We assess the haplotype phasing methods that are available, focusing in particular on statistical methods, and we discuss the practical aspects of their application. We also describe recent developments that may transform this field, particularly the use of identity-by-descent for computational phasing.
Collapse
Affiliation(s)
- Sharon R. Browning
- Department of Biostatistics, University of Washington, Seattle WA 98195, USA
| | - Brian L. Browning
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle WA 98195, USA
| |
Collapse
|
27
|
Kaklamani V, Yi N, Zhang K, Sadim M, Offit K, Oddoux C, Ostrer H, Mantzoros C, Pasche B. Polymorphisms of ADIPOQ and ADIPOR1 and prostate cancer risk. Metabolism 2011; 60:1234-43. [PMID: 21397927 PMCID: PMC3134585 DOI: 10.1016/j.metabol.2011.01.005] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/12/2010] [Revised: 01/09/2011] [Accepted: 01/17/2011] [Indexed: 11/20/2022]
Abstract
Studies have linked prostate cancer risk with insulin resistance and obesity. Circulating levels of adiponectin, a protein involved in insulin resistance and obesity, have been associated with prostate cancer risk. We studied the association of prostate cancer risk with haplotype tagging single nucleotide polymorphisms (SNPs) of the adiponectin (ADIPOQ) and adiponectin receptor 1 (ADIPOR1) chosen based on their functional relevance or association with other types of cancer. DNA samples from 465 cases and 441 healthy volunteers from New York City were genotyped for ADIPOQ rs266729, rs822395, rs822396, rs1501299, and rs2241766 SNPs and ADIPOR1 rs12733285, rs1342387, rs7539542, rs2232853, and rs10920531 SNPs. We performed both single- and multiple-SNP analyses. We found that rs12733285, rs7539452, rs266729, rs822395, rs822396, and rs1501299 were significantly associated with prostate cancer risk. Haplotype analysis confirmed these results and identified 5 ADIPOQ 4-SNP haplotypes and 1 ADIPOR1 2-SNP haplotype tightly associated with prostate cancer risk. Importantly, 2 ADIPOQ SNPs, rs266729 and rs1501299, have been previously associated with colon and breast cancer risk, respectively, in the same direction as in this study. These findings suggest that variants of the adiponectin pathway may be associated with susceptibility to various forms of common cancers and warrant validation studies.
Collapse
Affiliation(s)
- Virginia Kaklamani
- Cancer Genetics Program, Division of Hematology/Oncology, Department of Medicine and Robert H. Lurie Comprehensive Cancer Center, Feinberg School of Medicine, Northwestern University, Chicago, IL 60611
| | - Nengjun Yi
- Section on Statistical Genetics, Department of Biostatistics, School of Public Health, University of Alabama at Birmingham, Birmingham, AL 35294
| | - Kui Zhang
- Section on Statistical Genetics, Department of Biostatistics, School of Public Health, University of Alabama at Birmingham, Birmingham, AL 35294
| | - Maureen Sadim
- Cancer Genetics Program, Division of Hematology/Oncology, Department of Medicine and Robert H. Lurie Comprehensive Cancer Center, Feinberg School of Medicine, Northwestern University, Chicago, IL 60611
| | - Kenneth Offit
- Clinical Genetics Service, Memorial Sloan-Kettering Cancer Center, 1275 York Ave, New York, NY 10021
| | - Carole Oddoux
- Human Genetics Program, Department of Pediatrics, New York University Medical Center, New York, NY 10016
| | - Harry Ostrer
- Human Genetics Program, Department of Pediatrics, New York University Medical Center, New York, NY 10016
| | - Christos Mantzoros
- Division of Endocrinology and Metabolism, Department of Medicine, Beth Israel Deaconess Medical Center (BIDMC), Harvard Medical School, 330 Brookline Avenue, Stoneman 816, Boston, MA 02215
| | - Boris Pasche
- Division of Hematology/Oncology and Comprehensive Cancer Center, University of Alabama, Birmingham, AL 35294
| |
Collapse
|
28
|
Olsen MT, Volny VH, Bérubé M, Dietz R, Lydersen C, Kovacs KM, Dodd RS, Palsbøll PJ. A simple route to single-nucleotide polymorphisms in a nonmodel species: identification and characterization of SNPs in the Artic ringed seal (Pusa hispida hispida). Mol Ecol Resour 2011; 11 Suppl 1:9-19. [PMID: 21429159 DOI: 10.1111/j.1755-0998.2010.02941.x] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Affiliation(s)
- Morten Tange Olsen
- Evolutionary Genetics Group, Department of Genetics, Microbiology, and Toxicology, Stockholm University, Sweden.
| | | | | | | | | | | | | | | |
Collapse
|
29
|
Abo R, Wong J, Thomas A, Camp NJ. Haplotype association analyses in resources of mixed structure using Monte Carlo testing. BMC Bioinformatics 2010; 11:592. [PMID: 21143908 PMCID: PMC3016409 DOI: 10.1186/1471-2105-11-592] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2010] [Accepted: 12/09/2010] [Indexed: 01/16/2023] Open
Abstract
Background Genomewide association studies have resulted in a great many genomic regions that are likely to harbor disease genes. Thorough interrogation of these specific regions is the logical next step, including regional haplotype studies to identify risk haplotypes upon which the underlying critical variants lie. Pedigrees ascertained for disease can be powerful for genetic analysis due to the cases being enriched for genetic disease. Here we present a Monte Carlo based method to perform haplotype association analysis. Our method, hapMC, allows for the analysis of full-length and sub-haplotypes, including imputation of missing data, in resources of nuclear families, general pedigrees, case-control data or mixtures thereof. Both traditional association statistics and transmission/disequilibrium statistics can be performed. The method includes a phasing algorithm that can be used in large pedigrees and optional use of pseudocontrols. Results Our new phasing algorithm substantially outperformed the standard expectation-maximization algorithm that is ignorant of pedigree structure, and hence is preferable for resources that include pedigree structure. Through simulation we show that our Monte Carlo procedure maintains the correct type 1 error rates for all resource types. Power comparisons suggest that transmission-disequilibrium statistics are superior for performing association in resources of only nuclear families. For mixed structure resources, however, the newly implemented pseudocontrol approach appears to be the best choice. Results also indicated the value of large high-risk pedigrees for association analysis, which, in the simulations considered, were comparable in power to case-control resources of the same sample size. Conclusions We propose hapMC as a valuable new tool to perform haplotype association analyses, particularly for resources of mixed structure. The availability of meta-association and haplotype-mining modules in our suite of Monte Carlo haplotype procedures adds further value to the approach.
Collapse
Affiliation(s)
- Ryan Abo
- Department of Biomedical Informatics, University of Utah, Salt Lake City, USA.
| | | | | | | |
Collapse
|
30
|
Doan DD, Evans PA, Horton JD. A near-linear time algorithm for haplotype determination on general pedigrees. J Comput Biol 2010; 17:1451-65. [PMID: 20937017 DOI: 10.1089/cmb.2009.0133] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Abstract An O(nmα(m)) time algorithm is given for inferring haplotypes from genotypes of non-recombinant pedigree data, where n is the number of members, m is the number of sites, and α(m) is the inverse of the Ackermann function. The algorithm works on both tree and general pedigree structures with cycles. Constraints between pairs of heterozygous sites are used to resolve unresolved sites for the pedigree, enabling the algorithm to avoid problems previously experienced for non-tree pedigrees.
Collapse
Affiliation(s)
- Duong D Doan
- Faculty of Computer Science, University of New Brunswick, Fredericton, Canada.
| | | | | |
Collapse
|
31
|
Ji H, Ren J, Yan X, Huang X, Zhang B, Zhang Z, Huang L. The porcine MUC20 gene: molecular characterization and its association with susceptibility to enterotoxigenic Escherichia coli F4ab/ac. Mol Biol Rep 2010; 38:1593-601. [DOI: 10.1007/s11033-010-0268-y] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2010] [Accepted: 09/02/2010] [Indexed: 01/30/2023]
|
32
|
Ewens KG, Stewart DR, Ankener W, Urbanek M, McAllister JM, Chen C, Baig KM, Parker SCJ, Margulies EH, Legro RS, Dunaif A, Strauss JF, Spielman RS. Family-based analysis of candidate genes for polycystic ovary syndrome. J Clin Endocrinol Metab 2010; 95:2306-15. [PMID: 20200332 PMCID: PMC2869537 DOI: 10.1210/jc.2009-2703] [Citation(s) in RCA: 98] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
CONTEXT Polycystic ovary syndrome (PCOS) is a complex disorder having both genetic and environmental components. A number of association studies based on candidate genes have reported significant association, but few have been replicated. D19S884, a polymorphic marker in fibrillin 3 (FBN3), is one of the few association findings that has been replicated in independent sets of families. OBJECTIVE The aims of the study are: 1) to genotype single nucleotide polymorphisms (SNPs) in the region of D19S884; and 2) to follow up with an independent data set, published results reporting evidence for PCOS candidate gene associations. DESIGN The transmission disequilibrium test (TDT) was used to analyze linkage and association between PCOS and SNPs in candidate genes previously reported by us and by others as significantly associated with PCOS. SETTING The study was conducted at academic medical centers. PATIENTS OR OTHER PARTICIPANTS A total of 453 families having a proband with PCOS participated in the study. Sisters with PCOS were also included. There was a total of 502 probands and sisters with PCOS. INTERVENTION(S) There were no interventions. MAIN OUTCOME MEASURE(S) The outcome measure was transmission frequency of SNP alleles. RESULTS We identified a six-SNP haplotype block spanning a 6.7-kb region on chromosome 19p13.2 that includes D19S884. SNP haplotype allele-C alone and in combination with D19S884-allele 8 is significantly associated with PCOS: haplotype-C TDT chi(2) = 10.0 (P = 0.0016) and haplotype-C/A8 TDT chi(2) = 7.6 (P = 0.006). SNPs in four of the other 26 putative candidate genes that were tested using the TDT were nominally significant (ACVR2A, POMC, FEM1B, and SGTA). One SNP in POMC (rs12473543, chi(2) = 9.1; P(corrected) = 0.042) is significant after correction for multiple testing. CONCLUSIONS A polymorphic variant, D19S884, in FBN3 is associated with risk of PCOS. POMC is also a candidate gene of interest.
Collapse
Affiliation(s)
- Kathryn G Ewens
- Department of Genetics, University of Pennsylvania School of Medicine, Philadelphia, Pennsylvania 19104, USA
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
33
|
Shi M, Umbach DM, Weinberg CR. Testing haplotype-environment interactions using case-parent triads. Hum Hered 2010; 70:23-33. [PMID: 20413979 DOI: 10.1159/000298326] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2009] [Accepted: 01/31/2010] [Indexed: 02/02/2023] Open
Abstract
OBJECTIVE Joint analysis of multiple SNP markers can be informative, but studying joint effects of haplotypes and environmental exposures is challenging. Population structure can involve both genes and exposures and a case-control study is susceptible to bias from either source of stratification. We propose a procedure that uses case-parent triad data and, though not fully robust, resists bias from population structure. METHODS Our procedure assumes that haplotypes under study have no influence on propensity to exposure. Then, under a no-interaction null hypothesis (multiplicative scale), transmission of a causative haplotype from parents to affected offspring might show distortion from Mendelian proportions but should be independent of exposure. We used this insight to develop a permutation test of no haplotype-by-exposure interaction. RESULTS Simulations showed that our proposed test respects the nominal Type I error rate and provides good power under a variety of scenarios. We illustrate by examining whether SNP variants in GSTP1 modify the association between maternal smoking and oral clefting. CONCLUSION Our procedure offers desirable features: no need for haplotype estimation, validity under unspecified genetic main effects, tolerance to Hardy-Weinberg disequilibrium, ability to handle missing genotypes and a relatively large number of SNPs. Simulations suggest resistance to bias due to exposure-related population stratification.
Collapse
Affiliation(s)
- Min Shi
- Biostatistics Branch, NIEHS, NIH, DHHS, Research Triangle Park, NC 27709, USA
| | | | | |
Collapse
|
34
|
Brown BD, Nsengimana J, Barrett JH, Lawrence RA, Steiner L, Cheng S, Bishop DT, Samani NJ, Ball SG, Balmforth AJ, Hall AS. An evaluation of inflammatory gene polymorphisms in sibships discordant for premature coronary artery disease: the GRACE-IMMUNE study. BMC Med 2010; 8:5. [PMID: 20070880 PMCID: PMC2823655 DOI: 10.1186/1741-7015-8-5] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/27/2009] [Accepted: 01/13/2010] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND Inflammatory cytokines play a crucial role in coronary artery disease (CAD). We investigated the association between 48 coding and three non-coding single nucleotide polymorphisms (SNPs) from 35 inflammatory genes and the development of CAD, using a large discordant sibship collection (2699 individuals in 891 families). METHODS Family-based association tests (FBAT) and conditional logistic regression (CLR) were applied to single SNPs and haplotypes and, in CLR, traditional risk factors of CAD were adjusted for. RESULTS An association was observed between CAD and a common three-locus haplotype in the interleukin one (IL-1) cluster with P = 0.006 in all CAD cases, P = 0.01 in myocardial infarction (MI) cases and P = 0.0002 in young onset CAD cases (<50 years). The estimated odds ratio (OR) per copy of this haplotype is 1.21 (95% confidence interval [95CI] = 1.04 - 1.40) for CAD; 1.30 (95CI = 1.09 - 1.56) for MI and 1.50 (95CI = 1.22 - 1.86) for young onset CAD. When sex, smoking, hypertension and hypercholesterolaemia were adjusted for, the haplotype effect remained nominally significant (P = 0.05) in young onset CAD cases, more so (P = 0.002) when hypercholesterolaemia was excluded. As many as 82% of individuals affected by CAD had hypercholesterolaemia compared to only 29% of those unaffected, making the two phenotypes difficult to separate. CONCLUSION Despite the multiple hypotheses tested, the robustness of family design to population confoundings and the consistency with previous findings increase the likelihood of true association. Further investigation using larger data sets is needed in order for this to be confirmed. See the related commentary by Keavney: http://www.biomedcentral.com/1741-7015/8/6.
Collapse
Affiliation(s)
- Benjamin D Brown
- Leeds Institute of Genetics, Health and Therapeutics (LIGHT), University of Leeds, UK
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
35
|
Stone J, Gurrin LC, Hayes VM, Southey MC, Hopper JL, Byrnes GB. Sibship analysis of associations between SNP haplotypes and a continuous trait with application to mammographic density. Genet Epidemiol 2009; 34:309-18. [DOI: 10.1002/gepi.20462] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
|
36
|
Liu N, Bucala R, Zhao H. Modeling Informatively Missing Genotypes in Haplotype Analysis. COMMUN STAT-THEOR M 2009; 38:3445-3460. [PMID: 20052310 DOI: 10.1080/03610920802696588] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
It is common to have missing genotypes in practical genetic studies. The majority of the existing statistical methods, including those on haplotype analysis, assume that genotypes are missing at random-that is, at a given marker, different genotypes and different alleles are missing with the same probability. In our previous work, we have demonstrated that the violation of this assumption may lead to serious bias in haplotype frequency estimates and haplotype association analysis. We have proposed a general missing data model to simultaneously characterize missing data patterns across a set of two or more biallelic markers. We have proved that haplotype frequencies and missing data probabilities are identifiable if and only if there is linkage disequilibrium between these markers under the general missing data model. In this study, we extend our work to multi-allelic markers and observe a similar finding. Simulation studies on the analysis of haplotypes consisting of two markers illustrate that our proposed model can reduce the bias for haplotype frequency estimates due to incorrect assumptions on the missing data mechanism. Finally, we illustrate the utilities of our method through its application to a real data set from a study of scleroderma.
Collapse
Affiliation(s)
- Nianjun Liu
- Department of Biostatistics, University of Alabama at Birmingham, Birmingham, AL
| | | | | |
Collapse
|
37
|
Wilcke A, Weissfuss J, Kirsten H, Wolfram G, Boltze J, Ahnert P. The role of gene DCDC2 in German dyslexics. ANNALS OF DYSLEXIA 2009; 59:1-11. [PMID: 19238550 DOI: 10.1007/s11881-008-0020-7] [Citation(s) in RCA: 57] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/17/2008] [Accepted: 11/11/2008] [Indexed: 05/23/2023]
Abstract
Dyslexia is a complex reading and writing disorder with a strong genetic component. In a German case-control cohort, we studied the influence of the suspected dyslexia-associated gene DCDC2. For the first time in a German cohort, we describe association of a 2445 basepair deletion, first identified in an American study. Evidence of association for three DCDC2 single nucleotide polymorphisms (rs807724, rs793862, rs807701), previously identified in German or American cohorts, was replicated. A haplotype of these polymorphisms showed evidence for association as well. Thus, our data further corroborate association of DCDC2 with dyslexia. Analysis of functional subgroups suggests association of investigated DCDC2 variants mainly with nondysphonetic, nonsevere, but probably dyseidetic (surface) dyslexia. Based on the presumed function of DCDC2, our findings point to a role of impaired neuronal migration in the etiology of the disease.
Collapse
Affiliation(s)
- A Wilcke
- Fraunhofer-Institute for Cell Therapy and Immunology, Perlickstr. 1, 04103 Leipzig, Germany.
| | | | | | | | | | | |
Collapse
|
38
|
Kirsten H, Petit-Teixeira E, Scholz M, Hasenclever D, Hantmann H, Heider D, Wagner U, Sack U, Hugo Teixeira V, Prum B, Burkhardt J, Pierlot C, Emmrich F, Cornelis F, Ahnert P. Association of MICA with rheumatoid arthritis independent of known HLA-DRB1 risk alleles in a family-based and a case control study. Arthritis Res Ther 2009; 11:R60. [PMID: 19409079 PMCID: PMC2714103 DOI: 10.1186/ar2683] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2008] [Revised: 03/14/2009] [Accepted: 05/01/2009] [Indexed: 02/04/2023] Open
Abstract
INTRODUCTION The gene MICA encodes the protein major histocompatibility complex class I polypeptide-related sequence A. It is expressed in synovium of patients with rheumatoid arthritis (RA) and its implication in autoimmunity is discussed. We analyzed the association of genetic variants of MICA with susceptibility to RA. METHODS Initially, 300 French Caucasian individuals belonging to 100 RA trio families were studied. An additional 100 independent RA trio families and a German Caucasian case-control cohort (90/182 individuals) were available for replication. As MICA is situated in proximity to known risk alleles of the HLA-DRB1 locus, our analysis accounted for linkage disequilibrium either by analyzing the subgroup consisting of parents not carrying HLA-DRB1 risk alleles with transmission disequilibrium test (TDT) or by implementing a regression model including all available data. Analysis included a microsatellite polymorphism (GCT)n and single-nucleotide polymorphisms (SNPs) rs3763288 and rs1051794. RESULTS In contrast to the other investigated polymorphisms, the non-synonymously coding SNP MICA-250 (rs1051794, Lys196Glu) was strongly associated in the first family cohort (TDT: P = 0.014; regression model: odds ratio [OR] 0.46, 95% confidence interval [CI] 0.25 to 0.82, P = 0.007). Although the replication family sample showed only a trend, combined family data remained consistent with the hypothesis of MICA-250 association independent from shared epitope (SE) alleles (TDT: P = 0.027; regression model: OR 0.56, 95% CI 0.38 to 0.83, P = 0.003). We also replicated the protective association of MICA-250A within a German Caucasian cohort (OR 0.31, 95% CI 0.1 to 0.7, P = 0.005; regression model: OR 0.6, 95% CI 0.37 to 0.96, P = 0.032). We showed complete linkage disequilibrium of MICA-250 (D' = 1, r2= 1) with the functional MICA variant rs1051792 (D' = 1, r2= 1). As rs1051792 confers differential allelic affinity of MICA to the receptor NKG2D, this provides a possible functional explanation for the observed association. CONCLUSIONS We present evidence for linkage and association of MICA-250 (rs1051794) with RA independent of known HLA-DRB1 risk alleles, suggesting MICA as an RA susceptibility gene. However, more studies within other populations are necessary to prove the general relevance of this polymorphism for RA.
Collapse
Affiliation(s)
- Holger Kirsten
- Center for Biotechnology and Biomedicine (BBZ), University of Leipzig, Deutscher Platz 5, 04103 Leipzig, Germany.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
39
|
Cai Z, Sabaa H, Wang Y, Goebel R, Wang Z, Xu J, Stothard P, Lin G. Most parsimonious haplotype allele sharing determination. BMC Bioinformatics 2009; 10:115. [PMID: 19379528 PMCID: PMC2691739 DOI: 10.1186/1471-2105-10-115] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2008] [Accepted: 04/21/2009] [Indexed: 12/15/2022] Open
Abstract
Background The "common disease – common variant" hypothesis and genome-wide association studies have achieved numerous successes in the last three years, particularly in genetic mapping in human diseases. Nevertheless, the power of the association study methods are still low, in particular on quantitative traits, and the description of the full allelic spectrum is deemed still far from reach. Given increasing density of single nucleotide polymorphisms available and suggested by the block-like structure of the human genome, a popular and prosperous strategy is to use haplotypes to try to capture the correlation structure of SNPs in regions of little recombination. The key to the success of this strategy is thus the ability to unambiguously determine the haplotype allele sharing status among the members. The association studies based on haplotype sharing status would have significantly reduced degrees of freedom and be able to capture the combined effects of tightly linked causal variants. Results For pedigree genotype datasets of medium density of SNPs, we present two methods for haplotype allele sharing status determination among the pedigree members. Extensive simulation study showed that both methods performed nearly perfectly on breakpoint discovery, mutation haplotype allele discovery, and shared chromosomal region discovery. Conclusion For pedigree genotype datasets, the haplotype allele sharing status among the members can be deterministically, efficiently, and accurately determined, even for very small pedigrees. Given their excellent performance, the presented haplotype allele sharing status determination programs can be useful in many downstream applications including haplotype based association studies.
Collapse
Affiliation(s)
- Zhipeng Cai
- Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada.
| | | | | | | | | | | | | | | |
Collapse
|
40
|
Shi M, Umbach DM, Weinberg CR. Using case-parent triads to estimate relative risks associated with a candidate haplotype. Ann Hum Genet 2009; 73:346-59. [PMID: 19344450 DOI: 10.1111/j.1469-1809.2009.00515.x] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
Estimating haplotype relative risks in a family-based study is complicated by phase ambiguity and the many parameters needed to quantify relative risks for all possible diplotypes. This problem becomes manageable if a particular haplotype has been implicated previously as relevant to risk. We fit log-linear models to estimate the risks associated with a candidate haplotype relative to the aggregate of other haplotypes. Our approach uses existing haplotype-reconstruction algorithms but requires assumptions about the distribution of haplotypes among triads in the source population. We consider three levels of stringency for those assumptions: Hardy-Weinberg Equilibrium (HWE), random mating, and no assumptions at all. We assessed our method's performance through simulations encompassing a range of risk haplotype frequencies, missing data patterns, and relative risks for either offspring or maternal genetic effects. The unconstrained model provides robustness to bias from population structure but requires excessively large sample sizes unless there are few haplotypes. Assuming HWE accommodates many more haplotypes but sacrifices robustness. The model assuming random mating is intermediate, both in the number of haplotypes it can handle and in robustness. To illustrate, we reanalyze data from a study of orofacial clefts to investigate a 9-SNP candidate haplotype of the IRF6 gene.
Collapse
Affiliation(s)
- Min Shi
- Biostatistics Branch, NIEHS, NIH, DHHS, Research Triangle Park, NC 27709, USA
| | | | | |
Collapse
|
41
|
Gao G, Allison DB, Hoeschele I. Haplotyping methods for pedigrees. Hum Hered 2009; 67:248-66. [PMID: 19172084 DOI: 10.1159/000194978] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2008] [Accepted: 08/08/2008] [Indexed: 12/31/2022] Open
Abstract
Haplotypes provide valuable information in the study of diseases, complex traits, population histories, and evolutionary genetics. With the dramatic increase in the number of available single nucleotide polymorphism (SNP) markers, haplotype inference (haplotyping) using observed genotype data has become an important component of genetic studies in general and of statistical gene mapping in particular. Existing haplotyping methods include (1) population-based methods, (2) methods for pooled DNA samples, and (3) methods for family and pedigree data. The methods and computer programs for population data and pooled DNA samples were reviewed recently in the literature. As several authors noted, family and pedigree datasets are abundant and have unique advantages. In the past twenty years, many haplotyping methods for family and pedigree data have been developed. Therefore, in this contribution we review haplotyping methods and the corresponding computer programs suitable for family and pedigree data and discuss their applications and limitations. We explore the connections among these methods, and describe the challenges that remain to be addressed.
Collapse
Affiliation(s)
- Guimin Gao
- Department of Biostatistics, Section on Statistical Genetics, University of Alabama at Birmingham, Birmingham, Ala., USA
| | | | | |
Collapse
|
42
|
Jones B, Walsh D, Werner L, Fiumera A. Using blocks of linked single nucleotide polymorphisms as highly polymorphic genetic markers for parentage analysis. Mol Ecol Resour 2008; 9:487-97. [PMID: 21564678 DOI: 10.1111/j.1755-0998.2008.02444.x] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Single nucleotide polymorphisms (SNPs) are plentiful in most genomes and amenable to high throughput genotyping, but they are not yet popular for parentage or paternity analysis. The markers are bi-allelic, so individually they contain little information about parentage, and in nonmodel organisms the process of identifying large numbers of unlinked SNPs can be daunting. We explore the possibility of using blocks of between three and 26 linked SNPs as highly polymorphic molecular markers for reconstructing male genotypes in polyandrous organisms with moderate (five offspring) to large (25 offspring) clutches of offspring. Haplotypes are inferred for each block of linked SNPs using the programs Haplore and Phase 2.1. Each multi-SNP haplotype is then treated as a separate allele, producing a highly polymorphic, 'microsatellite-like' marker. A simulation study is performed using haplotype frequencies derived from empirical data sets from Drosophila melanogaster and Mus musculus populations. We find that the markers produced are competitive with microsatellite loci in terms of single parent exclusion probabilities, particularly when using six or more linked SNPs to form a haplotype. These markers contain only modest rates of missing data and genotyping or phasing errors and thus should be seriously considered as molecular markers for parentage analysis, particularly when the study is interested in the functional significance of polymorphisms across the genome.
Collapse
Affiliation(s)
- Beatrix Jones
- Centre for Mathematical Biology, Massey University, Private Bag 102-904, North Shore Mail Centre, Auckland 0745, New Zealand, Institute of Information and Mathematical Sciences, Massey University, Private Bag 102-904, North Shore Mail Centre, Auckland 0745, New Zealand, Dana Farber Cancer Research Center, Boston, MA 02115, USA, Department of Biological Sciences, Binghamton University, PO Box 6000, Binghamton, NY 13902, USA
| | | | | | | |
Collapse
|
43
|
Identity-by-descent estimation and mapping of qualitative traits in large, complex pedigrees. Genetics 2008; 179:1577-90. [PMID: 18622032 DOI: 10.1534/genetics.108.089912] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023] Open
Abstract
Computing identity-by-descent sharing between individuals connected through a large, complex pedigree is a computationally demanding task that often cannot be done using exact methods. What I present here is a rapid computational method for estimating, in large complex pedigrees, the probability that pairs of alleles are IBD given the single-point genotype data at that marker for all individuals. The method can be used on pedigrees of essentially arbitrary size and complexity without the need to divide the individuals into separate subpedigrees. I apply the method to do qualitative trait linkage mapping using the nonparametric sharing statistic S(pairs). The validity of the method is demonstrated via simulation studies on a 13-generation 3028-person pedigree with 700 genotyped individuals. An analysis of an asthma data set of individuals in this pedigree finds four loci with P-values <10(-3) that were not detected in prior analyses. The mapping method is fast and can complete analyses of approximately 150 affected individuals within this pedigree for thousands of markers in a matter of hours.
Collapse
|
44
|
Bochud M, Eap CB, Maillard M, Johnson T, Vollenweider P, Bovet P, Elston RC, Bergmann S, Beckmann JS, Waterworth DM, Mooser V, Gabriel A, Burnier M. Association of ABCB1 genetic variants with renal function in Africans and in Caucasians. BMC Med Genomics 2008; 1:21. [PMID: 18518969 PMCID: PMC2424071 DOI: 10.1186/1755-8794-1-21] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2008] [Accepted: 06/02/2008] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND The P-glycoprotein, encoded by the ABCB1 gene, is expressed in human endothelial and mesangial cells, which contribute to control renal plasma flow and glomerular filtration rate. We investigated the association of ABCB1 variants with renal function in African and Caucasian subjects. METHODS In Africans (290 subjects from 62 pedigrees), we genotyped the 2677G>T and 3435 C>T ABCB1 polymorphisms. Glomerular filtration rate (GFR) was measured using inulin clearance and effective renal plasma flow (ERPF) using para-aminohippurate clearance. In Caucasians (5382 unrelated subjects), we analyzed 30 SNPs located within and around ABCB1, using data from the Affymetrix 500 K chip. GFR was estimated using the simplified Modification of the Diet in Renal Disease (MDRD) and Cockcroft-Gault equations. RESULTS In Africans, compared to the reference genotype (GG or CC), each copy of the 2677T and 3435T allele was associated, respectively, with: GFR higher by 10.6 +/- 2.9 (P < 0.001) and 4.4 +/- 2.3 (P = 0.06) mL/min; ERPF higher by 47.5 +/- 11.6 (P < 0.001) and 28.1 +/- 10.5 (P = 0.007) mL/min; and renal resistances lower by 0.016 +/- 0.004 (P < 0.001) and 0.011 +/- 0.004 (P = 0.004) mm Hg/mL/min. In Caucasians, we identified 3 polymorphisms in the ABCB1 gene that were strongly associated with all estimates of GFR (smallest P value = 0.0006, overall P = 0.014 after multiple testing correction). CONCLUSION Variants of the ABCB1 gene were associated with renal function in both Africans and Caucasians and may therefore confer susceptibility to nephropathy in humans. If confirmed in other studies, these results point toward a new candidate gene for nephropathy in humans.
Collapse
Affiliation(s)
- Murielle Bochud
- University Institute of Social and Preventive Medicine (IUMSP), Centre Hospitalier Universitaire Vaudois and University of Lausanne, Bugnon 17, Lausanne, Switzerland
| | - Chin B Eap
- Unit of Biochemistry and Clinical Psychopharmacology, Center for Psychiatric Neurosciences, Department of Psychiatry, Centre Hospitalier Universitaire Vaudois and University of Lausanne Lausanne, Switzerland
| | - Marc Maillard
- Division of Nephrology, Centre Hospitalier Universitaire Vaudois, Lausanne, Switzerland
| | - Toby Johnson
- University Institute of Social and Preventive Medicine (IUMSP), Centre Hospitalier Universitaire Vaudois and University of Lausanne, Bugnon 17, Lausanne, Switzerland
- Department of Medical Genetics, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Peter Vollenweider
- Department of Medicine, Centre Hospitalier Universitaire Vaudois, Lausanne, Switzerland
| | - Pascal Bovet
- University Institute of Social and Preventive Medicine (IUMSP), Centre Hospitalier Universitaire Vaudois and University of Lausanne, Bugnon 17, Lausanne, Switzerland
- Ministry of Health, Victoria, Seychelles
| | - Robert C Elston
- Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland (OH), USA
| | - Sven Bergmann
- Department of Medical Genetics, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Jacques S Beckmann
- Department of Medical Genetics, University of Lausanne, Lausanne, Switzerland
- Service of Medical Genetics, Centre Hospitalier Universitaire Vaudois, Lausanne, Switzerland
| | - Dawn M Waterworth
- Division of Genetics, GlaxoSmithKline, Philadelphia, Pennsylvania, USA
| | - Vincent Mooser
- Division of Genetics, GlaxoSmithKline, Philadelphia, Pennsylvania, USA
| | | | - Michel Burnier
- Division of Nephrology, Centre Hospitalier Universitaire Vaudois, Lausanne, Switzerland
| |
Collapse
|
45
|
Zhang B, Ren J, Yan X, Huang X, Ji H, Peng Q, Zhang Z, Huang L. Investigation of the porcine MUC13 gene: isolation, expression, polymorphisms and strong association with susceptibility to enterotoxigenic Escherichia coli F4ab/ac. Anim Genet 2008; 39:258-66. [DOI: 10.1111/j.1365-2052.2008.01721.x] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
46
|
LI JING, JIANG TAO. A survey on haplotyping algorithms for tightly linked markers. J Bioinform Comput Biol 2008; 6:241-59. [PMID: 18324755 PMCID: PMC3326666 DOI: 10.1142/s0219720008003369] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2007] [Revised: 07/28/2007] [Accepted: 08/28/2007] [Indexed: 02/02/2023]
Abstract
Two grand challenges in the postgenomic era are to develop a detailed understanding of heritable variation in the human genome, and to develop robust strategies for identifying the genetic contribution to diseases and drug responses. Haplotypes of single nucleotide polymorphisms (SNPs) have been suggested as an effective representation of human variation, and various haplotype-based association mapping methods for complex traits have been proposed in the literature. However, humans are diploid and, in practice, genotype data instead of haplotype data are collected directly. Therefore, efficient and accurate computational methods for haplotype reconstruction are needed and have recently been investigated intensively, especially for tightly linked markers such as SNPs. This paper reviews statistical and combinatorial haplotyping algorithms using pedigree data, unrelated individuals, or pooled samples.
Collapse
Affiliation(s)
- JING LI
- Electrical Engineering and Computer Science, Department Case Western Reserve University, Cleveland, OH 44106, USA
| | - TAO JIANG
- Department of Computer Science, University of California, Riverside, Riverside, CA 92521, USA
| |
Collapse
|
47
|
Sherman EL, Nkrumah JD, Murdoch BM, Li C, Wang Z, Fu A, Moore SS. Polymorphisms and haplotypes in the bovine neuropeptide Y, growth hormone receptor, ghrelin, insulin-like growth factor 2, and uncoupling proteins 2 and 3 genes and their associations with measures of growth, performance, feed efficiency, and carcass merit in beef cattle1. J Anim Sci 2008; 86:1-16. [PMID: 17785604 DOI: 10.2527/jas.2006-799] [Citation(s) in RCA: 138] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Genes that regulate metabolism and energy partitioning have the potential to influence economically important traits in farm animals, as do polymorphisms within these genes. In the current study, SNP in the bovine neuropeptide Y (NPY), growth hormone receptor (GHR), ghrelin (GHRL), uncoupling proteins 2 and 3 (UCP2 and UCP3), IGF2, corticotrophin-releasing hormone (CRH), cocaine and amphetamine regulated transcript (CART), melanocortin-4 receptor (MC4R), proopiomelanocortin (POMC), and GH genes were evaluated for associations with growth, feed efficiency, and carcass merit in beef steers. In total, 24 SNP were evaluated for associations with these traits and haplotypes were constructed within each gene when 2 or more SNP showed significant associations. An A/G SNP located in intron 4 of the GHR gene had the largest effects on BW of the animals (dominance effect P < 0.01) and feed efficiency (allele substitution effect P < 0.05). Another A/G SNP located in the promoter region of GHR had similar effects but the haplotypes of these 2 SNP reduced the effects of the SNP located in intron 4. Three SNP in the NPY gene showed associations to marbling (P < 0.001) as well as with ADG, BW, and feed conversion ratio (FCR; P < 0.05). The combination of these 3 SNP into haplotypes generally improved the association or had a similar scale of association as each single SNP. Only 1 SNP in UCP3, an A/G SNP in intron 3, was associated with ADG (P = 0.025), partial efficiency of growth, and FCR (P < 0.01). Three SNP in UCP2 gene were in almost complete linkage disequilibrium and showed associations with lean meat yield, yield grade, DMI, and BW (P < 0.05). Haplo-types between the SNP in UCP3 and UCP2 generally reduced the associations seen individually in each SNP. An A/G SNP in the GHRL gene tended to show effects on residual feed intake, FCR, and partial efficiency of growth (P < 0.10). The IGF2 SNP most strongly affected LM area (P < 0.01), back fat, ADG, and FCR (P < 0.05). The SNP in the CART, MC4R, POMC, GH, and CRH genes did not show associations at P < 0.05 with any of the traits. Although most of the SNP that showed associations do not cause amino acid changes, these SNP could be linked to other yet to be detected causative mutations or nearby QTL. It will be very important to verify these results in other cattle populations.
Collapse
Affiliation(s)
- E L Sherman
- Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, AB, T6G 2P5, Canada
| | | | | | | | | | | | | |
Collapse
|
48
|
Abstract
Association methods based on linkage disequilibrium (LD) offer a promising approach for detecting genetic variations that are responsible for complex human diseases. Although methods based on individual single nucleotide polymorphisms (SNPs) may lead to significant findings, methods based on haplotypes comprising multiple SNPs on the same inherited chromosome may provide additional power for mapping disease genes and also provide insight on factors influencing the dependency among genetic markers. Such insights may provide information essential for understanding human evolution and also for identifying cis-interactions between two or more causal variants. Because obtaining haplotype information directly from experiments can be cost prohibitive in most studies, especially in large scale studies, haplotype analysis presents many unique challenges. In this chapter, we focus on two main issues: haplotype inference and haplotype-association analysis. We first provide a detailed review of methods for haplotype inference using unrelated individuals as well as related individuals from pedigrees. We then cover a number of statistical methods that employ haplotype information in association analysis. In addition, we discuss the advantages and limitations of different methods.
Collapse
Affiliation(s)
- Nianjun Liu
- Section on Statistical Genetics, Department of Biostatistics, University of Alabama at Birmingham, Birmingham, AL 35294, USA
| | | | | |
Collapse
|
49
|
Abstract
We present a class of haplotype-sharing statistics useful for association mapping in case-parent trio data. The framework presented allows derivation of novel tests as well as new simplified variance estimators for previously proposed tests. We give an overview of this framework and apply four such tests to the simulated data of Genetic Analysis Workshop 15. We find that these haplotype-based statistics result in greater power and better risk locus localization than the single locus single-nucleotide polymorphism analysis.
Collapse
Affiliation(s)
- Andrew S Allen
- Department of Biostatistics and Bioinformatics, Duke University, Hock Plaza, Suite 1102, 2424 Erwin Road, Durham, North Carolina 27705, USA.
| | | |
Collapse
|
50
|
Li X, Li J. Comparison of haplotyping methods using families and unrelated individuals on simulated rheumatoid arthritis data. BMC Proc 2007; 1 Suppl 1:S55. [PMID: 18466555 PMCID: PMC2367580 DOI: 10.1186/1753-6561-1-s1-s55] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
In this report, we compared haplotyping approaches using families and unrelated individuals on the simulated rheumatoid arthritis (RA) data in Problem 3 from Genetic Analysis Workshop (GAW) 15. To investigate these two approaches, we picked two representative programs: PedPhase and fastPHASE, respectively, for each approach. PedPhase is a rule-based method focusing on the haplotyping constraints within each pedigree and solving them using integer linear programming. fastPHASE is a statistical method based on the clustering property of haplotypes in a population over short regions. It is believed that with family information, one can obtain more accurate phasing results with considerably more cost for genotyping additional family members. Our results indicate that, though only relying on the constraints within each family (with four members) individually, PedPhase has better phasing accuracy than fastPHASE, even when the total numbers of genotyped individuals are the same. But for missing genotype imputation, fastPHASE performs better than PedPhase by taking population information into consideration. The relative influence of family constraints and population information on haplotyping accuracy as shown in this report provides some empirical bases on assessing the trade-off of genotyping family data under different settings.
Collapse
Affiliation(s)
- Xin Li
- Department of Electrical Engineering and Computer Science, Case Western Reserve University, 10900 Euclid Avenue, Cleveland, Ohio 44106, USA.
| | | |
Collapse
|