1
|
Qin H, Niu T, Zhao J. Identifying Multi-Omics Causers and Causal Pathways for Complex Traits. Front Genet 2019; 10:110. [PMID: 30847004 PMCID: PMC6393387 DOI: 10.3389/fgene.2019.00110] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2018] [Accepted: 01/30/2019] [Indexed: 12/23/2022] Open
Abstract
The central dogma of molecular biology delineates a unidirectional causal flow, i.e., DNA → RNA → protein → trait. Genome-wide association studies, next-generation sequencing association studies, and their meta-analyses have successfully identified ~12,000 susceptibility genetic variants that are associated with a broad array of human physiological traits. However, such conventional association studies ignore the mediate causers (i.e., RNA, protein) and the unidirectional causal pathway. Such studies may not be ideally powerful; and the genetic variants identified may not necessarily be genuine causal variants. In this article, we model the central dogma by a mediate causal model and analytically prove that the more remote an omics level is from a physiological trait, the smaller the magnitude of their correlation is. Under both random and extreme sampling schemes, we numerically demonstrate that the proteome-trait correlation test is more powerful than the transcriptome-trait correlation test, which in turn is more powerful than the genotype-trait association test. In conclusion, integrating RNA and protein expressions with DNA data and causal inference are necessary to gain a full understanding of how genetic causal variants contribute to phenotype variations.
Collapse
Affiliation(s)
- Huaizhen Qin
- Department of Epidemiology, College of Public Health and Health Professions and College of Medicine, University of Florida, Gainesville, FL, United States
- Department of Global Biostatistics and Data Science, Tulane University, New Orleans, LA, United States
| | - Tianhua Niu
- Department of Global Biostatistics and Data Science, Tulane University, New Orleans, LA, United States
- Department of Biochemistry and Molecular Biology, Tulane University School Medicine, New Orleans, LA, United States
| | - Jinying Zhao
- Department of Epidemiology, College of Public Health and Health Professions and College of Medicine, University of Florida, Gainesville, FL, United States
| |
Collapse
|
2
|
Lin DY, Tao R, Kalsbeek W, Zeng D, Gonzalez F, Fernández-Rhodes L, Graff M, Koch G, North K, Heiss G. Genetic association analysis under complex survey sampling: the Hispanic Community Health Study/Study of Latinos. Am J Hum Genet 2014; 95:675-88. [PMID: 25480034 DOI: 10.1016/j.ajhg.2014.11.005] [Citation(s) in RCA: 55] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2014] [Accepted: 11/11/2014] [Indexed: 12/27/2022] Open
Abstract
The cohort design allows investigators to explore the genetic basis of a variety of diseases and traits in a single study while avoiding major weaknesses of the case-control design. Most cohort studies employ multistage cluster sampling with unequal probabilities to conveniently select participants with desired characteristics, and participants from different clusters might be genetically related. Analysis that ignores the complex sampling design can yield biased estimation of the genetic association and inflation of the type I error. Herein, we develop weighted estimators that reflect unequal selection probabilities and differential nonresponse rates, and we derive variance estimators that properly account for the sampling design and the potential relatedness of participants in different sampling units. We compare, both analytically and numerically, the performance of the proposed weighted estimators with unweighted estimators that disregard the sampling design. We demonstrate the usefulness of the proposed methods through analysis of MetaboChip data in the Hispanic Community Health Study/Study of Latinos, which is the largest health study of the Hispanic/Latino population in the United States aimed at identifying risk factors for various diseases and determining the role of genes and environment in the occurrence of diseases. We provide guidelines on the use of weighted and unweighted estimators, as well as the relevant software.
Collapse
|
3
|
Quantitative trait analysis in sequencing studies under trait-dependent sampling. Proc Natl Acad Sci U S A 2013; 110:12247-52. [PMID: 23847208 DOI: 10.1073/pnas.1221713110] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
It is not economically feasible to sequence all study subjects in a large cohort. A cost-effective strategy is to sequence only the subjects with the extreme values of a quantitative trait. In the National Heart, Lung, and Blood Institute Exome Sequencing Project, subjects with the highest or lowest values of body mass index, LDL, or blood pressure were selected for whole-exome sequencing. Failure to account for such trait-dependent sampling can cause severe inflation of type I error and substantial loss of power in quantitative trait analysis, especially when combining results from multiple studies with different selection criteria. We present valid and efficient statistical methods for association analysis of sequencing data under trait-dependent sampling. We pay special attention to gene-based analysis of rare variants. Our methods can be used to perform quantitative trait analysis not only for the trait that is used to select subjects for sequencing but for any other traits that are measured. For a particular trait of interest, our approach properly combines the association results from all studies with measurements of that trait. This meta-analysis is substantially more powerful than the analysis of any single study. By contrast, meta-analysis of standard linear regression results (ignoring trait-dependent sampling) can be less powerful than the analysis of a single study. The advantages of the proposed methods are demonstrated through simulation studies and the National Heart, Lung, and Blood Institute Exome Sequencing Project data. The methods are applicable to other types of genetic association studies and nongenetic studies.
Collapse
|
4
|
Zheng G, Jinfeng X, Yuan A, Colin OW. Impact on modes of inheritance and relative risks of using extreme sampling when designing genetic association studies. Ann Hum Genet 2012; 77:80-4. [PMID: 23163532 DOI: 10.1111/j.1469-1809.2012.00733.x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2012] [Accepted: 08/28/2012] [Indexed: 11/29/2022]
Abstract
Using extreme phenotypes for association studies can improve statistical power . We study the impact of using samples with extremely high or low traits on the alternative model space, the genotype relative risks, and the genetic models in association studies. We prove the following results: when the risk allele causes high-trait values, the more extreme the high traits, the larger the genotype relative risks, which is not always true for using extreme low traits; we also prove that a genetic model theoretically changes with more extreme trait except for the recessive or dominant models. Practically, however, the impact of deviations from the true genetic model at a functional locus due to selective sampling is virtually negligible. The implications of our findings are discussed. Numerical values are reported for illustrations.
Collapse
Affiliation(s)
- Gang Zheng
- Office of Biostatistics Research, National Heart, Lung and Blood Institute, Bethesda, MD 20892, USA.
| | | | | | | |
Collapse
|
5
|
A simple bias correction in linear regression for quantitative trait association under two-tail extreme selection. Behav Genet 2011; 41:776-9. [PMID: 21626281 PMCID: PMC3162965 DOI: 10.1007/s10519-011-9475-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2011] [Accepted: 05/16/2011] [Indexed: 11/17/2022]
Abstract
Selective genotyping can increase power in quantitative trait association. One example of selective genotyping is two-tail extreme selection, but simple linear regression analysis gives a biased genetic effect estimate. Here, we present a simple correction for the bias.
Collapse
|
6
|
Tang Y. Equivalence of three score tests for association mapping of quantitative trait loci under selective genotyping. Genet Epidemiol 2010; 34:522-7. [PMID: 20552655 DOI: 10.1002/gepi.20498] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Huang and Lin ([2007] Am J Hum Genet 80:567-572) proposed a conditional-likelihood approach for mapping quantitative trait loci (QTL) under selective genotyping, and demonstrated via simulation that their model tends to be more powerful than the prospective linear regression. However, we show that the three score tests based on the conditional, prospective and retrospective likelihoods are numerically identical in testing association between a quantitative trait and a candidate locus. Two approximations are derived for calculating power and sample size for the score test. Compared to the random sampling, a single-tail selection generally reduces the power of the score test in mapping small effect QTLs. A two-tail selection generally enhances the QTL heritability; however, in small samples, the power of the test may actually decrease if the sample sizes are highly unbalanced in the upper and lower tails of the trait distribution.
Collapse
|
7
|
Tabara Y, Kohara K, Kita Y, Hirawa N, Katsuya T, Ohkubo T, Hiura Y, Tajima A, Morisaki T, Miyata T, Nakayama T, Takashima N, Nakura J, Kawamoto R, Takahashi N, Hata A, Soma M, Imai Y, Kokubo Y, Okamura T, Tomoike H, Iwai N, Ogihara T, Inoue I, Tokunaga K, Johnson T, Caulfield M, Munroe P, Umemura S, Ueshima H, Miki T. Common variants in the ATP2B1 gene are associated with susceptibility to hypertension: the Japanese Millennium Genome Project. Hypertension 2010; 56:973-80. [PMID: 20921432 DOI: 10.1161/hypertensionaha.110.153429] [Citation(s) in RCA: 88] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
Hypertension is one of the most common complex genetic disorders. We have described previously 38 single nucleotide polymorphisms (SNPs) with suggestive association with hypertension in Japanese individuals. In this study we extend our previous findings by analyzing a large sample of Japanese individuals (n=14 105) for the most associated SNPs. We also conducted replication analyses in Japanese of susceptibility loci for hypertension identified recently from genome-wide association studies of European ancestries. Association analysis revealed significant association of the ATP2B1 rs2070759 polymorphism with hypertension (P=5.3×10(-5); allelic odds ratio: 1.17 [95% CI: 1.09 to 1.26]). Additional SNPs in ATP2B1 were subsequently genotyped, and the most significant association was with rs11105378 (odds ratio: 1.31 [95% CI: 1.21 to 1.42]; P=4.1×10(-11)). Association of rs11105378 with hypertension was cross-validated by replication analysis with the Global Blood Pressure Genetics consortium data set (odds ratio: 1.13 [95% CI: 1.05 to 1.21]; P=5.9×10(-4)). Mean adjusted systolic blood pressure was highly significantly associated with the same SNP in a meta-analysis with individuals of European descent (P=1.4×10(-18)). ATP2B1 mRNA expression levels in umbilical artery smooth muscle cells were found to be significantly different among rs11105378 genotypes. Seven SNPs discovered in published genome-wide association studies were also genotyped in the Japanese population. In the combined analysis with replicated 3 genes, FGF5 rs1458038, CYP17A1, rs1004467, and CSK rs1378942, odds ratio of the highest risk group was 2.27 (95% CI: 1.65 to 3.12; P=4.6×10(-7)) compared with the lower risk group. In summary, this study confirmed common genetic variation in ATP2B1, as well as FGF5, CYP17A1, and CSK, to be associated with blood pressure levels and risk of hypertension.
Collapse
Affiliation(s)
- Yasuharu Tabara
- Department of Basic Medical Research and Education, Ehime University Graduate School of Medicine, Toon-City, Ehime, Japan.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
8
|
Xing C, Xing G. Power of selective genotyping in genome-wide association studies of quantitative traits. BMC Proc 2009; 3 Suppl 7:S23. [PMID: 20018013 PMCID: PMC2795920 DOI: 10.1186/1753-6561-3-s7-s23] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
The selective genotyping approach in quantitative genetics means genotyping only individuals with extreme phenotypes. This approach is considered an efficient way to perform gene mapping, and can be applied in both linkage and association studies. Selective genotyping in association mapping of quantitative trait loci was proposed to increase the power of detecting rare alleles of large effect. However, using this approach, only common variants have been detected. Studies on selective genotyping have been limited to single-locus scenarios. In this study we aim to investigate the power of selective genotyping in a genome-wide association study scenario, and we specifically study the impact of minor allele frequency of variants on the power of this approach. We use the Genetic Analysis Workshop 16 rheumatoid arthritis whole-genome data from the North American Rheumatoid Arthritis Consortium. Two quantitative traits, anti-cyclic citrullinated peptide and rheumatoid factor immunoglobulin M, and one binary trait, rheumatoid arthritis affection status, are used in the analysis. The power of selective genotyping is explored as a function of three parameters: sampling proportion, minor allele frequency of single-nucleotide polymorphism, and test level. The results show that the selective genotyping approach is more efficient in detecting common variants than detecting rare variants, and it is efficient only when the level of declaring significance is not stringent. In summary, the selective genotyping approach is most suitable for detecting common variants in candidate gene-based studies.
Collapse
Affiliation(s)
- Chao Xing
- Department of Clinical Sciences, University of Texas Southwestern Medical Center, Dallas, Texas 75390, USA.
| | | |
Collapse
|
9
|
Li YM, Xiang Y, Sun ZQ. An entropy-based measure for QTL mapping using extreme samples of population. Hum Hered 2007; 65:121-8. [PMID: 17934315 DOI: 10.1159/000109729] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2007] [Accepted: 06/13/2007] [Indexed: 11/19/2022] Open
Abstract
Quantitative trait locus (QTL) mapping can be accomplished through the method of selective genotyping, which is based on the differences of frequencies between an upper sample and a lower sample in population. However, amplifying the differences in marker allele frequencies in extreme samples may increase the probability for QTL mapping. Shannon entropy, which is a nonlinear function of allele frequencies, can be used to amplify the differences in marker allele frequencies. In this paper, we present a novel measure for linkage disequilibrium (LD) between a marker and single QTL, that is based on the comparison of the entropy and conditional entropy in a marker in extreme samples of population. This measure of LD between the marker and the trait locus can be used when the marker allele frequencies are known in the extreme samples of a population. We investigate the mapping performance in both analytic and simulation scenarios of a single QTL linked to a single marker. Our results show that the measure has very reasonable performance. In addition, a simulation study is performed on the basis of the haplotype frequencies of 10 SNPs of angiotensin-I converting enzyme (ACE) genes.
Collapse
Affiliation(s)
- Yu-Mei Li
- School of Public Health, Central South University, Changsha, PR China.
| | | | | |
Collapse
|
10
|
Huang BE, Lin DY. Efficient association mapping of quantitative trait loci with selective genotyping. Am J Hum Genet 2007; 80:567-76. [PMID: 17273979 PMCID: PMC1821103 DOI: 10.1086/512727] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2006] [Accepted: 01/09/2007] [Indexed: 11/03/2022] Open
Abstract
Selective genotyping (i.e., genotyping only those individuals with extreme phenotypes) can greatly improve the power to detect and map quantitative trait loci in genetic association studies. Because selection depends on the phenotype, the resulting data cannot be properly analyzed by standard statistical methods. We provide appropriate likelihoods for assessing the effects of genotypes and haplotypes on quantitative traits under selective-genotyping designs. We demonstrate that the likelihood-based methods are highly effective in identifying causal variants and are substantially more powerful than existing methods.
Collapse
Affiliation(s)
- B E Huang
- Department of Biostatistics, University of North Carolina, Chapel Hill 27599-7420, USA
| | | |
Collapse
|
11
|
Musani SK, Shriner D, Liu N, Feng R, Coffey CS, Yi N, Tiwari HK, Allison DB. Detection of gene x gene interactions in genome-wide association studies of human population data. Hum Hered 2007; 63:67-84. [PMID: 17283436 DOI: 10.1159/000099179] [Citation(s) in RCA: 138] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
Empirical evidence supporting the commonality of gene x gene interactions, coupled with frequent failure to replicate results from previous association studies, has prompted statisticians to develop methods to handle this important subject. Nonparametric methods have generated intense interest because of their capacity to handle high-dimensional data. Genome-wide association analysis of large-scale SNP data is challenging mathematically and computationally. In this paper, we describe major issues and questions arising from this challenge, along with methodological implications. Data reduction and pattern recognition methods seem to be the new frontiers in efforts to detect gene x gene interactions comprehensively. Currently, there is no single method that is recognized as the 'best' for detecting, characterizing, and interpreting gene x gene interactions. Instead, a combination of approaches with the aim of balancing their specific strengths may be the optimal approach to investigate gene x gene interactions in human data.
Collapse
Affiliation(s)
- Solomon K Musani
- Section on Statistical Genetics, Department of Biostatistics, University of Alabama at Birmingham, Birmingham, AL 35294, USA.
| | | | | | | | | | | | | | | |
Collapse
|
12
|
Zhang G, Nebert DW, Chakraborty R, Jin L. Statistical power of association using the extreme discordant phenotype design. Pharmacogenet Genomics 2006; 16:401-13. [PMID: 16708049 DOI: 10.1097/01.fpc.0000204995.99429.0f] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
BACKGROUND Selective genotyping has been proven to be an effective design for mapping quantitative trait loci (QTL), either by linkage or by allelic association, wherein the individual trait values can be used as the indices for phenotype selection. It has also been proposed that association studies of dichotomous traits can benefit from such design. When there is no quantitative measurement for phenotype available, cases and/or controls having extreme discordant phenotypes (EDP) can still be selected, based on their exposure status to a drug toxicity or environmental risk factor. The advantage of EDP design is intuitive and it has been successfully used in a number of studies. METHODS In this report, we developed a statistical method to calculate the power of EDP methodology, using a mixture model of genotype-specific distributions of a single biallelic susceptibility locus. We also compared the power of three statistical tests commonly used in association studies - including the chi test of allelic frequencies, the chi test of genotype frequencies, and the Armitage trend test. The power of two different EDP designs was evaluated under a range of scenarios. RESULTS AND CONCLUSION Our results indicate that the chi test of genotype frequency is a robust, though less powerful, test for single-locus association, and that EDP methodology is a powerful design for genetic association studies - especially those of common diseases caused by quantifiable drug toxicity or environmental risk factors.
Collapse
Affiliation(s)
- Ge Zhang
- Department of Environmental Health and Center for Environmental Genetics (CEG), University of Cincinnati Medical Center, Cincinnati, OH, USA
| | | | | | | |
Collapse
|
13
|
Wallace C, Chapman JM, Clayton DG. Improved power offered by a score test for linkage disequilibrium mapping of quantitative-trait loci by selective genotyping. Am J Hum Genet 2006; 78:498-504. [PMID: 16465623 PMCID: PMC1380292 DOI: 10.1086/500562] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2005] [Accepted: 12/16/2005] [Indexed: 11/03/2022] Open
Abstract
Selective genotyping is used to increase efficiency in genetic association studies of quantitative traits by genotyping only those individuals who deviate from the population mean. However, selection distorts the conditional distribution of the trait given genotype, and such data sets are usually analyzed using case-control methods, quantitative analysis within selected groups, or a combination of both. We show that Hotelling's T(2) test, recently proposed for association studies of one or several tagging single-nucleotide polymorphisms in a prospective (i.e., trait given genotype) design, can also be applied to the retrospective (i.e., genotype given trait) selective-genotyping design, and we use simulation to demonstrate its improved power over existing methods.
Collapse
Affiliation(s)
- Chris Wallace
- Department of Clinical Pharmacology, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, United Kingdom.
| | | | | |
Collapse
|
14
|
Chen Z, Zheng G, Ghosh K, Li Z. Linkage disequilibrium mapping of quantitative-trait Loci by selective genotyping. Am J Hum Genet 2005; 77:661-9. [PMID: 16175512 PMCID: PMC1275615 DOI: 10.1086/491658] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2005] [Accepted: 07/26/2005] [Indexed: 12/31/2022] Open
Abstract
The principles of linkage disequilibrium mapping of dichotomous diseases can be well applied to the mapping of quantitative-trait loci through the method of selective genotyping. In 1999, M. Slatkin considered a truncation selection (TS) approach. We propose in this report an extended TS approach and an extreme-rank-selection (ERS) approach. The properties of these selection approaches are studied analytically. By using a simulation study, we demonstrate that both the extended TS approach and the ERS approach provide remarkable improvements over Slatkin's original TS approach.
Collapse
Affiliation(s)
- Zehua Chen
- Department of Statistics and Applied Probability, National University of Singapore, Republic of Singapore.
| | | | | | | |
Collapse
|