1
|
Xu Z, Duan Q, Yan S, Chen W, Li M, Lange E, Li Y. DISSCO: direct imputation of summary statistics allowing covariates. Bioinformatics 2015; 31:2434-42. [PMID: 25810429 DOI: 10.1093/bioinformatics/btv168] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2014] [Accepted: 03/17/2015] [Indexed: 01/09/2023] Open
Abstract
BACKGROUND Imputation of individual level genotypes at untyped markers using an external reference panel of genotyped or sequenced individuals has become standard practice in genetic association studies. Direct imputation of summary statistics can also be valuable, for example in meta-analyses where individual level genotype data are not available. Two methods (DIST and ImpG-Summary/LD), that assume a multivariate Gaussian distribution for the association summary statistics, have been proposed for imputing association summary statistics. However, both methods assume that the correlations between association summary statistics are the same as the correlations between the corresponding genotypes. This assumption can be violated in the presence of confounding covariates. METHODS We analytically show that in the absence of covariates, correlation among association summary statistics is indeed the same as that among the corresponding genotypes, thus serving as a theoretical justification for the recently proposed methods. We continue to prove that in the presence of covariates, correlation among association summary statistics becomes the partial correlation of the corresponding genotypes controlling for covariates. We therefore develop direct imputation of summary statistics allowing covariates (DISSCO). RESULTS We consider two real-life scenarios where the correlation and partial correlation likely make practical difference: (i) association studies in admixed populations; (ii) association studies in presence of other confounding covariate(s). Application of DISSCO to real datasets under both scenarios shows at least comparable, if not better, performance compared with existing correlation-based methods, particularly for lower frequency variants. For example, DISSCO can reduce the absolute deviation from the truth by 3.9-15.2% for variants with minor allele frequency <5%.
Collapse
Affiliation(s)
- Zheng Xu
- Department of Biostatistics, Department of Genetics, Department of Computer Science
| | - Qing Duan
- Department of Genetics, Curriculum in Bioinformatics and Computational Biology, Department of Statistics, University of North Carolina, Chapel Hill, NC 27599, USA
| | - Song Yan
- Department of Biostatistics, Department of Genetics, Department of Computer Science
| | - Wei Chen
- Division of Pediatric Pulmonary Medicine, Allergy and Immunology, Department of Pediatrics, Children's Hospital of Pittsburgh of UPMC, University of Pittsburgh School of Medicine, Department of Biostatistics, Department of Human Genetics, University of Pittsburgh School of Public Health, Pittsburgh, PA 15224, USA and
| | - Mingyao Li
- Department of Biostatistics and Epidemiology, University of Pennsylvania, Philadelphia, PA, USA
| | - Ethan Lange
- Department of Biostatistics, Department of Genetics
| | - Yun Li
- Department of Biostatistics, Department of Genetics, Department of Computer Science
| |
Collapse
|
2
|
Jiang Y, Zhang R, Lv H, Li J, Wang M, Chang Y, Lv W, Sheng X, Zhang J, Liu P, Zheng J, Shi M, Liu G. HGPGD: the human gene population genetic difference database. PLoS One 2013; 8:e64150. [PMID: 23717556 PMCID: PMC3661546 DOI: 10.1371/journal.pone.0064150] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2012] [Accepted: 04/11/2013] [Indexed: 11/18/2022] Open
Abstract
Demographic events such as migration, and evolutionary events like mutation and recombination, have contributed to the genetic variations that are found in the human genome. During the evolution and differentiation of human populations, different functional genes and pathways (a group of genes that act together to perform specific biological tasks) would have displayed different degrees of genetic diversity or evolutionary conservatism. To query the genetic differences of functional genes or pathways in populations, we have developed the human gene population genetic difference (HGPGD) database. Currently, 11 common population genetic features, 18,158 single human genes, 220 KEGG (Kyoto Encyclopedia of Genes and Genomes) human pathways and 4,639 Gene Ontology (GO) categories (3,269 in biological process; 862 in molecular function; and 508 in cellular component) are available in the HGPGD database. The 11 population genetic features are related mainly to three aspects: allele frequency, linkage disequilibrium pattern, and transferability of tagSNPs. By entering a list of Gene IDs, KEGG pathway IDs or GO category IDs and selecting a population genetic feature, users can search the genetic differences between pairwise HapMap populations. We hope that, when the researchers carry out gene-based, KEGG pathway-based or GO category-based research, they can take full account of the genetic differences between populations. The HGPGD database (V1.0) is available at http://www.bioapp.org/hgpgd.
Collapse
Affiliation(s)
- Yongshuai Jiang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Ruijie Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Hongchao Lv
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Jin Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Miao Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Yiman Chang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Wenhua Lv
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Xin Sheng
- Graduate University of Chinese Academy of Sciences, Beijing, China
| | - Jingjing Zhang
- Department of Epidemiology and Statistics, School of Public Health, Central South University, Changsha, China
| | - Panpan Liu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Jiajia Zheng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Miao Shi
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Guiyou Liu
- Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, China
| |
Collapse
|
3
|
Duan Q, Liu EY, Croteau-Chonka DC, Mohlke KL, Li Y. A comprehensive SNP and indel imputability database. Bioinformatics 2013; 29:528-31. [PMID: 23292738 DOI: 10.1093/bioinformatics/bts724] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
MOTIVATION Genotype imputation has become an indispensible step in genome-wide association studies (GWAS). Imputation accuracy, directly influencing downstream analysis, has shown to be improved using re-sequencing-based reference panels; however, this comes at the cost of high computational burden due to the huge number of potentially imputable markers (tens of millions) discovered through sequencing a large number of individuals. Therefore, there is an increasing need for access to imputation quality information without actually conducting imputation. To facilitate this process, we have established a publicly available SNP and indel imputability database, aiming to provide direct access to imputation accuracy information for markers identified by the 1000 Genomes Project across four major populations and covering multiple GWAS genotyping platforms. RESULTS SNP and indel imputability information can be retrieved through a user-friendly interface by providing the ID(s) of the desired variant(s) or by specifying the desired genomic region. The query results can be refined by selecting relevant GWAS genotyping platform(s). This is the first database providing variant imputability information specific to each continental group and to each genotyping platform. In Filipino individuals from the Cebu Longitudinal Health and Nutrition Survey, our database can achieve an area under the receiver-operating characteristic curve of 0.97, 0.91, 0.88 and 0.79 for markers with minor allele frequency >5%, 3-5%, 1-3% and 0.5-1%, respectively. Specifically, by filtering out 48.6% of markers (corresponding to a reduction of up to 48.6% in computational costs for actual imputation) based on the imputability information in our database, we can remove 77%, 58%, 51% and 42% of the poorly imputed markers at the cost of only 0.3%, 0.8%, 1.5% and 4.6% of the well-imputed markers with minor allele frequency >5%, 3-5%, 1-3% and 0.5-1%, respectively. AVAILABILITY http://www.unc.edu/∼yunmli/imputability.html
Collapse
Affiliation(s)
- Qing Duan
- Department of Genetics, University of North Carolina, Chapel Hill, NC 27599, USA
| | | | | | | | | |
Collapse
|
4
|
Identifying highly conserved and highly differentiated gene ontology categories in human populations. PLoS One 2011; 6:e27871. [PMID: 22140477 PMCID: PMC3227580 DOI: 10.1371/journal.pone.0027871] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2010] [Accepted: 10/27/2011] [Indexed: 11/19/2022] Open
Abstract
Detecting and interpreting certain system-level characteristics associated with human population genetic differences is a challenge for human geneticists. In this study, we conducted a population genetic study using the HapMap genotype data to identify certain special Gene Ontology (GO) categories associated with high/low genetic difference among 11 Hapmap populations. Initially, the genetic differences in each gene region among these populations were measured using allele frequency, linkage disequilibrium (LD) pattern, and transferability of tagSNPs. The associations between each GO term and these genetic differences were then identified. The results showed that cellular process, catalytic activity, binding, and some of their sub-terms were associated with high levels of genetic difference, and genes involved in these functional categories displayed, on average, high genetic diversity among different populations. By contrast, multicellular organismal processes, molecular transducer activity, and some of their sub-terms were associated with low levels of genetic difference. In particular, the neurological system process under the multicellular organismal process category had low levels of genetic difference; the neurological function also showed high evolutionary conservation between species in some previous studies. These results may provide a new insight into the understanding of human evolutionary history at the system-level.
Collapse
|
5
|
Haas DM, Sischy AC, McCullough W, Simsiman AJ. Maternal ethnicity influences on neonatal respiratory outcomes after antenatal corticosteroid use for anticipated preterm delivery. J Matern Fetal Neonatal Med 2010; 24:516-20. [PMID: 20672908 DOI: 10.3109/14767058.2010.506228] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
OBJECTIVE To explore the influence of maternal ethnicity on neonatal outcomes after antenatal corticosteroid administration. METHODS A retrospective review of ethnicity, maternal factors, and neonatal birth outcomes was performed for preterm births at a single institution. Cases were limited to women who received antenatal corticosteroids. The impact of ethnicity on specific neonatal respiratory outcomes and mortality was analyzed by bivariate comparisons and by logistic regression analysis. RESULTS Complete ethnicity data were obtained for 548 women. Controlling for gestational age at delivery, diabetes, whether the subject completed a course of steroids, and the dosing of the steroids, logistic regression demonstrated that ethnicity was independently associated with respiratory distress syndrome (compared to Caucasians: African-Americans OR 0.49 (95% CI 0.29-0.85); Filipinos OR 0.45 (95% CI 0.21-0.96). CONCLUSIONS Ethnicity is independently associated with neonatal respiratory outcomes after antenatal corticosteroid use. Perhaps individualized dosing of antenatal corticosteroids is needed to further improve neonatal outcomes.
Collapse
Affiliation(s)
- David M Haas
- Department of OB/GYN, Indiana University School of Medicine, Indianapolis, IN, USA.
| | | | | | | |
Collapse
|
6
|
Abstract
Understanding genetic variation between populations is important because it affects the portability of human genome-wide analytical methods. We compared genetic variation and substructure between Malawians and other African and non-African HapMap populations. Allele frequencies and adjacent linkage disequilibrium (LD) were measured for 617 715 single nucleotide polymorphisms (SNPs) across subject genomes. Allele frequencies in the Malawian population (N=226) were highly correlated with allele frequencies in HapMap populations of African ancestry (AFA, N=376), namely Yoruban in Ibadan, Nigeria (Spearman's r(2)=0.97), Luhya in Webuye, Kenya (r(2)=0.97), African Americans in the southwest United States (r(2)=0.94) and Maasai in Kinyawa, Kenya (r(2)=0.91). This correlation was much lower between Malawians and other ancestry populations (r(2)<0.52). LD correlations between Malawians and HapMap populations were strongest for the populations of AFA (AFA r(2)>0.82, other ancestries r(2)<0.57). Principal components analyses revealed little population substructure within our Malawi sample but provided clear distinction between Malawians, AFA populations and two European populations. Five SNPs within the lactase gene (LCT) had substantially different allele frequencies between the Malawi population and Maasai in Kenyawa, Kenya (rs3769013, rs730005, rs3769012, rs2304370; P-values <1 x 10(-33)).
Collapse
|
7
|
Lange LA, Croteau-Chonka DC, Marvelle AF, Qin L, Gaulton KJ, Kuzawa CW, McDade TW, Wang Y, Li Y, Levy S, Borja JB, Lange EM, Adair LS, Mohlke KL. Genome-wide association study of homocysteine levels in Filipinos provides evidence for CPS1 in women and a stronger MTHFR effect in young adults. Hum Mol Genet 2010; 19:2050-8. [PMID: 20154341 DOI: 10.1093/hmg/ddq062] [Citation(s) in RCA: 59] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Plasma homocysteine (Hcy) level is associated with cardiovascular disease and may play an etiologic role in vascular damage, a precursor for atherosclerosis. We performed a genome-wide association study for Hcy in 1786 unrelated Filipino women from the Cebu Longitudinal Health and Nutrition Survey (CLHNS). The most strongly associated single-nucleotide polymorphism (SNP) (rs7422339, P = 4.7 x 10(-13)) encodes Thr1405Asn in the gene CPS1 and explained 3.0% of variation in the Hcy level. The widely studied MTHFR C677T SNP (rs1801133) was also highly significant (P = 8.7 x 10(-10)) and explained 1.6% of the trait variation. We also genotyped these two SNPs in 1679 CLHNS young adult offspring. The MTHFR C677T SNP was strongly associated with Hcy (P = 1.9 x 10(-26)) and explained approximately 5.1% of the variation in the offspring. In contrast, the CPS1 variant was significant only in females (P = 0.11 in all; P = 0.0087 in females). Combined analysis of all samples confirmed that the MTHFR variant was more strongly associated with Hcy in the offspring (interaction P = 1.2 x 10(-5)). Furthermore, although there was evidence for a positive synergistic effect between the CPS1 and MTHFR SNPs in the offspring (interaction P = 0.0046), there was no significant evidence for an interaction in the mothers (P = 0.55). These data confirm a recent finding that CPS1 is a locus influencing Hcy levels in women and suggest that genetic effects on Hcy may differ across developmental stages.
Collapse
Affiliation(s)
- Leslie A Lange
- Department of Genetics, University of North Carolina, Chapel Hill, NC 27599, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
8
|
Lins TC, Abreu BS, Pereira RW. TagSNP transferability and relative loss of variability prediction from HapMap to an admixed population. J Biomed Sci 2009; 16:73. [PMID: 19682379 PMCID: PMC2737315 DOI: 10.1186/1423-0127-16-73] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2009] [Accepted: 08/14/2009] [Indexed: 01/30/2023] Open
Abstract
Background The application of a subset of single nucleotide polymorphisms, the tagSNPs, can be useful in capturing untyped SNPs information in a genomic region. TagSNP transferability from the HapMap dataset to admixed populations is of uncertain value due population structure, admixture, drift and recombination effects. In this work an empirical dataset from a Brazilian admixed sample was evaluated against the HapMap population to measure tagSNP transferability and the relative loss of variability prediction. Methods The transferability study was carried out using SNPs dispersed over four genomic regions: the PTPN22, HMGCR, VDR and CETP genes. Variability coverage and the prediction accuracy for tagSNPs in the selected genomic regions of HapMap phase II were computed using a prediction accuracy algorithm. Transferability of tagSNPs and relative loss of prediction were evaluated according to the difference between the Brazilian sample and the pooled and single HapMap population estimates. Results Each population presented different levels of prediction per gene. On average, the Brazilian (BRA) sample displayed a lower power of prediction when compared to HapMap and the pooled sample. There was a relative loss of prediction for BRA when using single HapMap populations, but a pooled HapMap dataset generated minor loss of variability prediction and lower standard deviations, except at the VDR locus at which loss was minor using CEU tagSNPs. Conclusion Studies that involve tagSNP selection for an admixed population should not be generally correlated with any specific HapMap population and can be better represented with a pooled dataset in most cases.
Collapse
Affiliation(s)
- Tulio C Lins
- Programa de Pós-Graduação em Ciências Genômicas e Biotecnologia, Universidade Católica de Brasília, Brasília, DF, Brazil.
| | | | | |
Collapse
|
9
|
Abstract
Metabolomics describes the measurement of the full complement of the products of metabolism in a single biological sample and correlating these metabolomic profiles with known physiological or pathological states. The metabolome offers the possibility of finding unique fingerprints responsible for different phenotypes. Analytical techniques such as nuclear magnetic resonance or mass spectrometry measure thousands of compounds within the metabolome simultaneously and appropriate data mining and database tools allow the finding of significant correlations between the measured metabolomes. The first direct outcome of nutritional metabolomics will be the discovery of biomarkers, which can reveal changes in health and disease but also indicate short term and long-term dietary intake. The concerted actions of nutrigenomics and metabolomics will play a crucial role in understanding how specific interactions of single nucleotide polymorphisms (SNP) influence a person's response to a diet. Finally, systems biology approaches to human nutrition combine transcriptomics, proteomics and metabolomics with the aim of understanding how diets interact within the human being.
Collapse
Affiliation(s)
- A Koulman
- Medical Research Council Human Nutrition Research, Cambridge, UK
| | | |
Collapse
|
10
|
Marvelle AF, Lange LA, Qin L, Adair LS, Mohlke KL. Association of FTO with obesity-related traits in the Cebu Longitudinal Health and Nutrition Survey (CLHNS) Cohort. Diabetes 2008; 57:1987-91. [PMID: 18426866 PMCID: PMC2453620 DOI: 10.2337/db07-1700] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/03/2007] [Accepted: 04/15/2008] [Indexed: 01/29/2023]
Abstract
OBJECTIVE The underlying genetic component of obesity-related traits is not well understood, and there is limited evidence to support genetic association shared across multiple studies, populations, and environmental contexts. The present study investigated the association between candidate variants and obesity-related traits in a sample of 1,886 adult Filipino women from the Cebu Longitudinal Health and Nutrition Survey (CLHNS) cohort. RESEARCH DESIGN AND METHODS We selected and genotyped 19 single nucleotide polymorphisms in 10 genes (ADRB2, ADRB3, FTO, GNB3, INSIG2, LEPR, PPARG, TNF, UCP2, and UCP3) that had been previously reported to be associated with an obesity-related quantitative trait. RESULTS We observed evidence for association of the A allele of rs9939609 (FTO intron 1) with increased BMI (P = 0.0072 before multiple test correction), baseline BMI (P = 0.0015), longitudinal BMI based on eight surveys from 1983 to 2005 (P = 0.000029), waist circumference (P = 0.0094), and weight (P = 0.021). The increase in average BMI was approximately 0.4 for each additional A allele. We also observed association of the ADRB3 Trp64Arg variant with BMI, waist circumference, percent body fat, weight, fat mass, arm fat area, and arm muscle area (P < 0.05), although the direction of effect is inconsistent with the majority of previous reports. CONCLUSIONS Our study confirms that FTO is a common obesity susceptibility gene in Filipinos, with an effect size similar to that seen in samples of European origin.
Collapse
Affiliation(s)
- Amanda F Marvelle
- Department of Genetics, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | | | | | | | | |
Collapse
|
11
|
Xing J, Witherspoon DJ, Watkins WS, Zhang Y, Tolpinrud W, Jorde LB. HapMap tagSNP transferability in multiple populations: general guidelines. Genomics 2008; 92:41-51. [PMID: 18482828 DOI: 10.1016/j.ygeno.2008.03.011] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2008] [Revised: 03/26/2008] [Accepted: 03/28/2008] [Indexed: 11/30/2022]
Abstract
Linkage disequilibrium (LD) has received much attention recently because of its value in localizing disease-causing genes. Due to the extensive LD between neighboring loci in the human genome, it is believed that a subset of the single nucleotide polymorphisms in a region (tagSNPs) can be selected to capture most of the remaining SNP variants. In this study, we examined LD patterns and HapMap tagSNP transferability in more than 300 individuals. A South Indian sample and an African Mbuti Pygmy population sample were included to evaluate the performance of HapMap tagSNPs in geographically distinct and genetically isolated populations. Our results show that HapMap tagSNPs selected with r(2) >= 0.8 can capture more than 85% of the SNPs in populations that are from the same continental group. Combined tagSNPs from HapMap CEU and CHB+JPT serve as the best reference for the Indian sample. The HapMap YRI are a sufficient reference for tagSNP selection in the Pygmy sample. In addition to our findings, we reviewed over 25 recent studies of tagSNP transferability and propose a general guideline for selecting tagSNPs from HapMap populations.
Collapse
Affiliation(s)
- Jinchuan Xing
- Department of Human Genetics, Eccles Institute of Human Genetics, University of Utah, Salt Lake City, UT 84112, USA
| | | | | | | | | | | |
Collapse
|
12
|
Pemberton TJ, Jakobsson M, Conrad DF, Coop G, Wall JD, Pritchard JK, Patel PI, Rosenberg NA. Using population mixtures to optimize the utility of genomic databases: linkage disequilibrium and association study design in India. Ann Hum Genet 2007; 72:535-46. [PMID: 18513279 DOI: 10.1111/j.1469-1809.2008.00457.x] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
When performing association studies in populations that have not been the focus of large-scale investigations of haplotype variation, it is often helpful to rely on genomic databases in other populations for study design and analysis - such as in the selection of tag SNPs and in the imputation of missing genotypes. One way of improving the use of these databases is to rely on a mixture of database samples that is similar to the population of interest, rather than using the single most similar database sample. We demonstrate the effectiveness of the mixture approach in the application of African, European, and East Asian HapMap samples for tag SNP selection in populations from India, a genetically intermediate region underrepresented in genomic studies of haplotype variation.
Collapse
Affiliation(s)
- T J Pemberton
- Institute for Genetic Medicine, University of Southern California, 2250 Alcazar St., Los Angeles, California 90033, USA
| | | | | | | | | | | | | | | |
Collapse
|