1
|
Diao G, Qin J. New semiparametric regression method with applications in selection‐biased sampling and missing data problems. CAN J STAT 2021. [DOI: 10.1002/cjs.11615] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Affiliation(s)
- Guoqing Diao
- Department of Biostatistics and Bioinformatics George Washington University Washington DC U.S.A
| | - Jing Qin
- National Institution of Allergy and Infectious Diseases Bethesda MD U.S.A
| |
Collapse
|
2
|
Khan Z, Ismail M, Samawi H. Mixture ranked set sampling for estimation of population mean and median. J STAT COMPUT SIM 2019. [DOI: 10.1080/00949655.2019.1691553] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Affiliation(s)
- Zahid Khan
- Department of Statistics, COMSATS University Islamabad, Lahore, Pakistan
| | - Muhammad Ismail
- Department of Statistics, COMSATS University Islamabad, Lahore, Pakistan
| | - Hani Samawi
- Department of Biostatistics, Georgia Southern University, Statesboro, GA, USA
| |
Collapse
|
3
|
Bjørnland T, Bye A, Ryeng E, Wisløff U, Langaas M. Powerful extreme phenotype sampling designs and score tests for genetic association studies. Stat Med 2018; 37:4234-4251. [PMID: 30088284 DOI: 10.1002/sim.7914] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2017] [Revised: 06/20/2018] [Accepted: 06/25/2018] [Indexed: 12/15/2022]
Abstract
We consider cross-sectional genetic association studies (common and rare variants) where non-genetic information is available or feasible to obtain for N individuals, but where it is infeasible to genotype all N individuals. We consider continuously measurable Gaussian traits (phenotypes). Genotyping n < N extreme phenotype individuals can yield better power to detect phenotype-genotype associations, as compared to randomly selecting n individuals. We define a person as having an extreme phenotype if the observed phenotype is above a specified threshold or below a specified threshold. We consider a model where these thresholds can be tailored to each individual. The classical extreme sampling design is to set equal thresholds for all individuals. We introduce a design (z-extreme sampling) where personalized thresholds are defined based on the residuals of a regression model including only non-genetic (fully available) information. We derive score tests for the situation where only n extremes are analyzed (complete case analysis) and for the situation where the non-genetic information on N - n non-extremes is included in the analysis (all case analysis). For the classical design, all case analysis is generally more powerful than complete case analysis. For the z-extreme sample, we show that all case and complete case tests are equally powerful. Simulations and data analysis also show that z-extreme sampling is at least as powerful as the classical extreme sampling design and the classical design is shown to be at times less powerful than random sampling. The method of dichotomizing extreme phenotypes is also discussed.
Collapse
Affiliation(s)
- Thea Bjørnland
- Department of Mathematical Sciences, Norwegian University of Science and Technology, Trondheim, Norway
| | - Anja Bye
- Department of Circulation and Medical Imaging, Norwegian University of Science and Technology, Trondheim, Norway
| | - Einar Ryeng
- Department of Cancer Research and Molecular Medicine, Norwegian University of Science and Technology, Trondheim, Norway
| | - Ulrik Wisløff
- Department of Circulation and Medical Imaging, Norwegian University of Science and Technology, Trondheim, Norway
| | - Mette Langaas
- Department of Mathematical Sciences, Norwegian University of Science and Technology, Trondheim, Norway
| |
Collapse
|
4
|
Mahdizadeh M, Zamanzade E. Efficient body fat estimation using multistage pair ranked set sampling. Stat Methods Med Res 2017; 28:223-234. [DOI: 10.1177/0962280217720473] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Rank-based sampling methods are applicable in settings where precise measurements are expensive, but small sets of units can be accurately ranked at negligible cost. This article introduces one such a design, called multistage pair ranked set sampling. It mitigates ranking burden associated with a competitor scheme, namely multistage ranked set sampling. The mean estimator in multistage pair ranked set sampling is unbiased, and under perfect rankings has variance no larger than its simple random sampling counterpart. Although the suggested mean estimator is outperformed by its multistage ranked set sampling analog in terms of precision under perfect rankings, the situation may be reversed if cost considerations are taken into account. The methodology is illustrated using a medical dataset.
Collapse
Affiliation(s)
- M Mahdizadeh
- Department of Statistics, Hakim Sabzevari University, Sabzevar, Iran
| | - Ehsan Zamanzade
- Department of Statistics, University of Isfahan, Isfahan, Iran
| |
Collapse
|
5
|
Yan L, Hofmann N, Li S, Ferreira ME, Song B, Jiang G, Ren S, Quigley C, Fickus E, Cregan P, Song Q. Identification of QTL with large effect on seed weight in a selective population of soybean with genome-wide association and fixation index analyses. BMC Genomics 2017; 18:529. [PMID: 28701220 PMCID: PMC5508781 DOI: 10.1186/s12864-017-3922-0] [Citation(s) in RCA: 42] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2017] [Accepted: 07/04/2017] [Indexed: 12/11/2022] Open
Abstract
BACKGROUND Soybean seed weight is not only a yield component, but also a critical trait for various soybean food products such as sprouts, edamame, soy nuts, natto and miso. Linkage analysis and genome-wide association study (GWAS) are two complementary and powerful tools to connect phenotypic differences to the underlying contributing loci. Linkage analysis is based on progeny derived from two parents, given sufficient sample size and biological replication, it usually has high statistical power to map alleles with relatively small effect on phenotype, however, linkage analysis of the bi-parental population can't detect quantitative trait loci (QTL) that are fixed in the two parents. Because of the small seed weight difference between the two parents in most families of previous studies, these populations are not suitable to detect QTL that have considerable effects on seed weight. GWAS is based on unrelated individuals to detect alleles associated with the trait under investigation. The ability of GWAS to capture major seed weight QTL depends on the frequency of the accessions with small and large seed weight in the population being investigated. Our objective was to identify QTL that had a pronounced effect on seed weight using a selective population of soybean germplasm accessions and the approach of GWAS and fixation index analysis. RESULTS We selected 166 accessions from the USDA Soybean Germplasm Collection with either large or small seed weight and could typically grow in the same location. The accessions were evaluated for seed weight in the field for two years and genotyped with the SoySNP50K BeadChip containing >42,000 SNPs. Of the 17 SNPs on six chromosomes that were significantly associated with seed weight in two years based on a GWAS of the selective population, eight on chromosome 4 or chromosome 17 had significant Fst values between the large and small seed weight sub-populations. The seed weight difference of the two alleles of these eight significant SNPs varied from 8.1 g to 11.7 g/100 seeds in two years. We also identified haplotypes in three haplotype blocks with significant effects on seed weight. These findings were validated in a panel with 3753 accessions from the USDA Soybean Germplasm Collection. CONCLUSION This study highlighted the usefulness of selective genotyping populations coupled with GWAS and fixation index analysis for the identification of QTL with substantial effects on seed weight in soybean. This approach may help geneticists and breeders to more efficiently identify major QTL controlling other traits. The major regions and haplotypes we have identified that control seed weight differences in soybean will facilitate the identification of genes regulating this important trait.
Collapse
Affiliation(s)
- Long Yan
- Institute of Cereal and Oil Crops, Hebei Academy of Agricultural and Forestry Sciences/ Shijiazhuang Branch of National Soybean Improvement Center / Key Laboratory of Crop Genetics and Breeding of Hebei, Shijiazhuang, 050035 China
| | - Nicolle Hofmann
- Soybean Genomics and Improvement Laboratory, United States Department of Agriculture, Agricultural Research Service, 10300 Baltimore Ave, Building 006, Beltsville, MD 20705 USA
- Present address: Davare Laboratory, Pediatric Cancer Biology Program, Oregon Health and Science University, 3181 SW Sam Jackson Park Rd, Portland, OR 97239 USA
| | - Shuxian Li
- United States Department of Agriculture, Agricultural Research Service (USDA-ARS), Crop Genetics Research Unit, Stoneville, MS 38776 USA
| | | | - Baohua Song
- Department of Biological Sciences, University of North Carolina at Charlotte, Charlotte, NC 28223 USA
| | - Guoliang Jiang
- Agricultural Research Station, Virginia State University, P.O. Box 9061, Petersburg, VA 23806 USA
| | - Shuxin Ren
- Agricultural Research Station, Virginia State University, P.O. Box 9061, Petersburg, VA 23806 USA
| | - Charles Quigley
- Soybean Genomics and Improvement Laboratory, United States Department of Agriculture, Agricultural Research Service, 10300 Baltimore Ave, Building 006, Beltsville, MD 20705 USA
| | - Edward Fickus
- Soybean Genomics and Improvement Laboratory, United States Department of Agriculture, Agricultural Research Service, 10300 Baltimore Ave, Building 006, Beltsville, MD 20705 USA
| | - Perry Cregan
- Soybean Genomics and Improvement Laboratory, United States Department of Agriculture, Agricultural Research Service, 10300 Baltimore Ave, Building 006, Beltsville, MD 20705 USA
| | - Qijian Song
- Soybean Genomics and Improvement Laboratory, United States Department of Agriculture, Agricultural Research Service, 10300 Baltimore Ave, Building 006, Beltsville, MD 20705 USA
| |
Collapse
|
6
|
Wen Y, Hao J, Xiao X, Guo X, Wang W, Yang T, Shen H, Tian Q, Tan L, Deng HW, Zhang F. Evaluation of the relationship and genetic overlap between Kashin-Beck disease and body mass index. Scand J Rheumatol 2016; 45:512-517. [PMID: 27053287 DOI: 10.3109/03009742.2016.1139742] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
OBJECTIVES Body mass index (BMI) is one of the major factors affecting the development of osteoarthritis (OA) but there is currently no information available regarding the relationship between BMI and Kashin-Beck disease (KBD). Our aim in this study was to investigate the relationship and genetic overlap between BMI and KBD. METHOD A total of 2050 Han Chinese subjects participated in this study. Using a cohort of 333 grade I KBD patients, logistic regression analysis was conducted to evaluate the correlation between BMI and KBD. Another independent sample of 1717 subjects was genotyped for a genome-wide association study (GWAS) using Affymetrix Human SNP 6.0 Arrays. Single nucleotide polymorphism (SNP) effect concordance analysis (SECA) was applied to the GWAS summaries of KBD and BMI for pleiotropy analysis. Genome-wide bivariate association analysis (GWBAA) of KBD and BMI was carried out to identify the genes with pleiotropic effects on KBD and BMI. The relevance of identified genes with KBD was validated by gene expression profiling and immunohistochemistry. RESULTS BMI correlated positively with knee movement disorder in KBD (coefficient β = 0.068, p = 0.045). SECA identified a significant pleiotropic effect (empirical p = 0.021) between KBD and BMI. In the GWBAA, the rs1893577 of the ADAMTS1 gene achieved the most significant association signal (p = 7.38 × 10-9). ADAMTS1 was also up-regulated in KBD vs. normal (ratio = 2.64 ± 2.80) and KBD vs. OA (ratio = 2.31 ± 2.01). The rate of ADAMTS1-positive chondrocytes in KBD was significantly higher than that in OA (p < 0.05) and healthy controls (p < 0.05). CONCLUSIONS Our results suggest that ADAMTS1 is a novel susceptibility gene for KBD.
Collapse
Affiliation(s)
- Y Wen
- a Key Laboratory of Trace Elements and Endemic Diseases of National Health and Family Planning Commission, School of Public Health, Health Science Center , Xi'an Jiaotong University , Xi'an , P. R. China
| | - J Hao
- a Key Laboratory of Trace Elements and Endemic Diseases of National Health and Family Planning Commission, School of Public Health, Health Science Center , Xi'an Jiaotong University , Xi'an , P. R. China
| | - X Xiao
- a Key Laboratory of Trace Elements and Endemic Diseases of National Health and Family Planning Commission, School of Public Health, Health Science Center , Xi'an Jiaotong University , Xi'an , P. R. China
| | - X Guo
- a Key Laboratory of Trace Elements and Endemic Diseases of National Health and Family Planning Commission, School of Public Health, Health Science Center , Xi'an Jiaotong University , Xi'an , P. R. China
| | - W Wang
- a Key Laboratory of Trace Elements and Endemic Diseases of National Health and Family Planning Commission, School of Public Health, Health Science Center , Xi'an Jiaotong University , Xi'an , P. R. China
| | - T Yang
- b Key Laboratory of Biomedical Information Engineering of Ministry of Education and Institute of Molecular Genetics, School of Life Science and Technology , Xi'an Jiaotong University , Xi'an , P. R. China
| | - H Shen
- c Department of Biostatistics and Bioinformatics , Tulane University School of Public Health and Tropical Medicine , New Orleans , LA , USA.,d Center for Bioinformatics and Genomics , Tulane University , New Orleans , LA , USA
| | - Q Tian
- c Department of Biostatistics and Bioinformatics , Tulane University School of Public Health and Tropical Medicine , New Orleans , LA , USA.,d Center for Bioinformatics and Genomics , Tulane University , New Orleans , LA , USA
| | - L Tan
- e Laboratory of Molecular and Statistical Genetics, College of Life Sciences , Hunan Normal University , Changsha , P. R. China
| | - H-W Deng
- c Department of Biostatistics and Bioinformatics , Tulane University School of Public Health and Tropical Medicine , New Orleans , LA , USA.,d Center for Bioinformatics and Genomics , Tulane University , New Orleans , LA , USA
| | - F Zhang
- a Key Laboratory of Trace Elements and Endemic Diseases of National Health and Family Planning Commission, School of Public Health, Health Science Center , Xi'an Jiaotong University , Xi'an , P. R. China
| |
Collapse
|
7
|
A bivariate genome-wide association study identifies ADAM12 as a novel susceptibility gene for Kashin-Beck disease. Sci Rep 2016; 6:31792. [PMID: 27545300 PMCID: PMC4992896 DOI: 10.1038/srep31792] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2016] [Accepted: 07/26/2016] [Indexed: 11/28/2022] Open
Abstract
Kashin-Beck disease (KBD) is a chronic osteoarthropathy, which manifests as joint deformities and growth retardation. Only a few genetic studies of growth retardation associated with the KBD have been carried out by now. In this study, we conducted a two-stage bivariate genome-wide association study (BGWAS) of the KBD using joint deformities and body height as study phenotypes, totally involving 2,417 study subjects. Articular cartilage specimens from 8 subjects were collected for immunohistochemistry. In the BGWAS, ADAM12 gene achieved the most significant association (rs1278300 p-value = 9.25 × 10−9) with the KBD. Replication study observed significant association signal at rs1278300 (p-value = 0.007) and rs1710287 (p-value = 0.002) of ADAM12 after Bonferroni correction. Immunohistochemistry revealed significantly decreased expression level of ADAM12 protein in the KBD articular cartilage (average positive chondrocyte rate = 47.59 ± 7.79%) compared to healthy articular cartilage (average positive chondrocyte rate = 64.73 ± 5.05%). Our results suggest that ADAM12 gene is a novel susceptibility gene underlying both joint destruction and growth retardation of the KBD.
Collapse
|
8
|
PPARGC1B gene is associated with Kashin-Beck disease in Han Chinese. Funct Integr Genomics 2016; 16:459-63. [PMID: 27108113 DOI: 10.1007/s10142-016-0496-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2015] [Revised: 04/08/2016] [Accepted: 04/18/2016] [Indexed: 10/21/2022]
Abstract
Kashin-Beck disease (KBD) is a chronic osteochondropathy. The genetic basis of KBD remains elusive now. To investigate the relationship between PPARGC1B gene polymorphism and KBD, we conducted a two-stage association study using 2743 unrelated Han Chinese subjects. In the first stage, three SNPs rs1078324, rs4705372, and rs11743128 of PPARGC1B gene were genotyped in 559 KBD patients and 467 health controls using Sequenom MassARRAY platform. In the second stage, the association analysis results of PPARGC1B with KBD were replicated using an independent sample of 1717 subjects. SNP association analysis was conducted by PLINK software. Genotype imputation was conducted by IMPUTE 2.0 against the reference panel of the 1000 genome project. Bonferroni multiple testing correction was performed. We observed a significant association signal at rs4705372 (P = 0.0160) and a suggestive association signal at rs11743128 (P = 0.0290). Further replication study confirmed the association signals of rs4705372 (P = 0.0026) and rs11743128 (P = 0.0387) in the independent validation sample. Our study results suggest that PPARGC1B is a novel susceptibility gene of KBD.
Collapse
|
9
|
Wen Y, Guo X, Hao J, Xiao X, Wang W, Wu C, Wang S, Yang T, Shen H, Chen X, Tan L, Tian Q, Deng HW, Zhang F. Integrative analysis of genome-wide association studies and gene expression profiles identified candidate genes for osteoporosis in Kashin-Beck disease patients. Osteoporos Int 2016; 27:1041-1046. [PMID: 26462493 DOI: 10.1007/s00198-015-3364-y] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/23/2015] [Accepted: 10/01/2015] [Indexed: 12/11/2022]
Abstract
SUMMARY The molecular mechanism of osteoporosis (OP) in Kashin-Beck disease (KBD) patients was unclear. Our results suggest that KBD and OP shared some common causal genes, functionally involved in skeletal growth and development and chronic inflammation. Our results provide novel clues for clarifying the molecular mechanism of OP in KBD patients. INTRODUCTION KBD is a chronic skeletal disorder with osteopenia and OP. The pathogenesis of OP in KBD patients remains elusive. METHODS A total of 1717 subjects participated in this study. KBD was diagnosed according to the clinical diagnosis criteria of China (GB16395-1996). The bone mineral density (BMD) and bone areas of the ulna and radius, hip, and lumbar (L1-L4) were measured with a Hologic 4500 W dual-energy X-ray absorptiometry scanner. Genotyping was conducted using Affymetrix SNP Array 6.0. Gene expression profiling of peripheral blood mononuclear cells of KBD and OP patients were compared using Affymetrix HG-U133 plus 2.0 arrays and Agilent Human 1A arrays, respectively. Genome-wide association studies (GWAS) were conducted by PLINK. SCEA and DAVID were applied for pleiotropy and functional enrichment analysis, respectively. RESULTS SCEA analysis observed significant pleiotropic effects between KBD and the ulna and radius BMD (P value = 5.99 × 10(-3)). GWAS meta-analysis identified six candidate genes with pleiotropic effects, including PDGFD, SOX5, DPYD, CTR9, SPP1, and COL4A1. GO analysis identified 16 significant GO shared by KBD and the ulna and radius BMD, involved in cell morphogenesis and apoptosis. Pathway enrichment analysis detected two common pathways for KBD and the ulna and radius BMD, including calcium signaling pathway and vascular smooth muscle contraction pathway. Gene expression analysis detected three up-regulated inflammation-related genes for KBD and OP, including IL1B, IL8, and CCL1. CONCLUSION This study reported several candidate genes involved in the development of OP in KBD patients.
Collapse
Affiliation(s)
- Y Wen
- School of Public Health, Health Science Center, Xi'an Jiaotong University, Yan Ta West Road 76, Xi'an, 710061, China
| | - X Guo
- School of Public Health, Health Science Center, Xi'an Jiaotong University, Yan Ta West Road 76, Xi'an, 710061, China
| | - J Hao
- School of Public Health, Health Science Center, Xi'an Jiaotong University, Yan Ta West Road 76, Xi'an, 710061, China
| | - X Xiao
- School of Public Health, Health Science Center, Xi'an Jiaotong University, Yan Ta West Road 76, Xi'an, 710061, China
| | - W Wang
- School of Public Health, Health Science Center, Xi'an Jiaotong University, Yan Ta West Road 76, Xi'an, 710061, China
| | - C Wu
- School of Public Health, Health Science Center, Xi'an Jiaotong University, Yan Ta West Road 76, Xi'an, 710061, China
| | - S Wang
- School of Public Health, Health Science Center, Xi'an Jiaotong University, Yan Ta West Road 76, Xi'an, 710061, China
| | - T Yang
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, and Institute of Molecular Genetics, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, China
| | - H Shen
- Department of Biostatistics and Bioinformatics, Tulane University School of Public Health and Tropical Medicine, New Orleans, LA, USA
- Center for Bioinformatics and Genomics, Tulane University, New Orleans, LA, USA
| | - X Chen
- Laboratory of Molecular and Statistical Genetics, College of Life Sciences, Hunan Normal University, Changsha, China
| | - L Tan
- Laboratory of Molecular and Statistical Genetics, College of Life Sciences, Hunan Normal University, Changsha, China
| | - Q Tian
- Department of Biostatistics and Bioinformatics, Tulane University School of Public Health and Tropical Medicine, New Orleans, LA, USA
- Center for Bioinformatics and Genomics, Tulane University, New Orleans, LA, USA
| | - H-W Deng
- Department of Biostatistics and Bioinformatics, Tulane University School of Public Health and Tropical Medicine, New Orleans, LA, USA
- Center for Bioinformatics and Genomics, Tulane University, New Orleans, LA, USA
| | - F Zhang
- School of Public Health, Health Science Center, Xi'an Jiaotong University, Yan Ta West Road 76, Xi'an, 710061, China.
| |
Collapse
|
10
|
Fontanesi L, Scotti E, Speroni C, Buttazzoni L, Russo V. A selective genotyping approach identifies single nucleotide polymorphisms in porcine chromosome 2 genes associated with production and carcass traits in Italian heavy pigs. ITALIAN JOURNAL OF ANIMAL SCIENCE 2016. [DOI: 10.4081/ijas.2011.e15] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
11
|
Yin J, Samawi H, Linder D. Improved nonparametric estimation of the optimal diagnostic cut-off point associated with the Youden index under different sampling schemes. Biom J 2016; 58:915-34. [PMID: 26756282 DOI: 10.1002/bimj.201500036] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2015] [Revised: 10/07/2015] [Accepted: 10/08/2015] [Indexed: 11/08/2022]
Abstract
A diagnostic cut-off point of a biomarker measurement is needed for classifying a random subject to be either diseased or healthy. However, the cut-off point is usually unknown and needs to be estimated by some optimization criteria. One important criterion is the Youden index, which has been widely adopted in practice. The Youden index, which is defined as the maximum of (sensitivity + specificity -1), directly measures the largest total diagnostic accuracy a biomarker can achieve. Therefore, it is desirable to estimate the optimal cut-off point associated with the Youden index. Sometimes, taking the actual measurements of a biomarker is very difficult and expensive, while ranking them without the actual measurement can be relatively easy. In such cases, ranked set sampling can give more precise estimation than simple random sampling, as ranked set samples are more likely to span the full range of the population. In this study, kernel density estimation is utilized to numerically solve for an estimate of the optimal cut-off point. The asymptotic distributions of the kernel estimators based on two sampling schemes are derived analytically and we prove that the estimators based on ranked set sampling are relatively more efficient than that of simple random sampling and both estimators are asymptotically unbiased. Furthermore, the asymptotic confidence intervals are derived. Intensive simulations are carried out to compare the proposed method using ranked set sampling with simple random sampling, with the proposed method outperforming simple random sampling in all cases. A real data set is analyzed for illustrating the proposed method.
Collapse
Affiliation(s)
- Jingjing Yin
- Department of Biostatistics, Jiann-Ping Hsu College of Public Health, Georgia Southern University, Hendricks Hall 1007, P.O. Box 8015, Statesboro, GA 30460, USA
| | - Hani Samawi
- Department of Biostatistics, Jiann-Ping Hsu College of Public Health, Georgia Southern University, Hendricks Hall 1007, P.O. Box 8015, Statesboro, GA 30460, USA
| | - Daniel Linder
- Department of Biostatistics, Jiann-Ping Hsu College of Public Health, Georgia Southern University, Hendricks Hall 1007, P.O. Box 8015, Statesboro, GA 30460, USA
| |
Collapse
|
12
|
Distribution-free tolerance intervals with nomination samples: Applications to mercury contamination in fish. ACTA ACUST UNITED AC 2015. [DOI: 10.1016/j.stamet.2015.03.002] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
13
|
Zhang F, Dai L, Lin W, Wang W, Liu X, Zhang J, Yang T, Liu X, Shen H, Chen X, Tan L, Tian Q, Deng HW, Xu X, Guo X. Exome sequencing identified FGF12 as a novel candidate gene for Kashin-Beck disease. Funct Integr Genomics 2015; 16:13-7. [PMID: 26290467 DOI: 10.1007/s10142-015-0462-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2015] [Revised: 07/30/2015] [Accepted: 08/02/2015] [Indexed: 11/28/2022]
Abstract
The objective of this study was to identify novel causal genes involved in the pathogenesis of Kashin-Beck disease (KBD). A representative grade III KBD sib pair with serious skeletal growth and development failure was subjected to exome sequencing using the Illumina Hiseq2000 platform. The detected gene mutations were then filtered against the data of 1000 Genome Project, dbSNP database, and BGI inhouse database, and replicated by a genome-wide association study (GWAS) of KBD. Ninety grade II or III KBD patients with extreme KBD phenotypes and 1627 healthy controls were enrolled in the GWAS. Affymetrix Genome-Wide Human SNP Array 6.0 was applied for genotyping. PLINK software was used for association analysis. We identified a novel 106T>C at the 3'UTR of the FGF12 gene, which has not been reported by now. Sequence alignment observed high conversation at the mutated 3'UTR+106T>C locus across various vertebrates. In the GWAS of KBD, we detected nine SNPs of the FGF12 gene showing association evidence (P value < 0.05) with KBD. The most significant association signal was observed at rs1847340 (P value = 1.90 × 10(-5)). This study suggests that FGF12 was a susceptibility gene of KBD. Our results provide novel clues for revealing the pathogenesis of KBD and the biological function of FGF12.
Collapse
Affiliation(s)
- Feng Zhang
- Key Laboratory of Trace Elements and Endemic Diseases of Ministry of Health, School of Public Health, Health Science Center, Xi'an Jiaotong University, Xi'an, People's Republic of China
| | | | - Weimin Lin
- Department of Nephrology and Traditional Chinese Medicine, The People's Liberating Army 451 Hospital, Xi'an, People's Republic of China
| | - Wenyu Wang
- Key Laboratory of Trace Elements and Endemic Diseases of Ministry of Health, School of Public Health, Health Science Center, Xi'an Jiaotong University, Xi'an, People's Republic of China
| | | | | | - Tielin Yang
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, and Institute of Molecular Genetics, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, People's Republic of China
| | - Xiaogang Liu
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, and Institute of Molecular Genetics, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, People's Republic of China
| | - Hui Shen
- Department of Biostatistics and Bioinformatics, Tulane University School of Public Health and Tropical Medicine, New Orleans, LA, USA.,Center for Bioinformatics and Genomics, Tulane University, New Orleans, LA, USA
| | - Xiangding Chen
- Laboratory of Molecular and Statistical Genetics, College of Life Sciences, Hunan Normal University, Changsha, People's Republic of China
| | - Lijun Tan
- Laboratory of Molecular and Statistical Genetics, College of Life Sciences, Hunan Normal University, Changsha, People's Republic of China
| | - Qing Tian
- Department of Biostatistics and Bioinformatics, Tulane University School of Public Health and Tropical Medicine, New Orleans, LA, USA.,Center for Bioinformatics and Genomics, Tulane University, New Orleans, LA, USA
| | - Hong-Wen Deng
- Department of Biostatistics and Bioinformatics, Tulane University School of Public Health and Tropical Medicine, New Orleans, LA, USA.,Center for Bioinformatics and Genomics, Tulane University, New Orleans, LA, USA
| | - Xun Xu
- BGI-Shenzhen, Shenzhen, 518083, China.
| | - Xiong Guo
- Key Laboratory of Trace Elements and Endemic Diseases of Ministry of Health, School of Public Health, Health Science Center, Xi'an Jiaotong University, Xi'an, People's Republic of China.
| |
Collapse
|
14
|
Tao R, Zeng D, Franceschini N, North KE, Boerwinkle E, Lin DY. Analysis of Sequence Data Under Multivariate Trait-Dependent Sampling. J Am Stat Assoc 2015; 110:560-572. [PMID: 26366025 DOI: 10.1080/01621459.2015.1008099] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
High-throughput DNA sequencing allows for the genotyping of common and rare variants for genetic association studies. At the present time and for the foreseeable future, it is not economically feasible to sequence all individuals in a large cohort. A cost-effective strategy is to sequence those individuals with extreme values of a quantitative trait. We consider the design under which the sampling depends on multiple quantitative traits. Under such trait-dependent sampling, standard linear regression analysis can result in bias of parameter estimation, inflation of type I error, and loss of power. We construct a likelihood function that properly reflects the sampling mechanism and utilizes all available data. We implement a computationally efficient EM algorithm and establish the theoretical properties of the resulting maximum likelihood estimators. Our methods can be used to perform separate inference on each trait or simultaneous inference on multiple traits. We pay special attention to gene-level association tests for rare variants. We demonstrate the superiority of the proposed methods over standard linear regression through extensive simulation studies. We provide applications to the Cohorts for Heart and Aging Research in Genomic Epidemiology Targeted Sequencing Study and the National Heart, Lung, and Blood Institute Exome Sequencing Project.
Collapse
Affiliation(s)
- Ran Tao
- Department of Biostatistics, University of North Carolina, Chapel Hill, NC 27599
| | - Donglin Zeng
- Department of Biostatistics, University of North Carolina, Chapel Hill, NC 27599
| | - Nora Franceschini
- Department of Epidemiology, University of North Carolina, Chapel Hill, NC 27599
| | - Kari E North
- Department of Epidemiology, University of North Carolina, Chapel Hill, NC 27599
| | - Eric Boerwinkle
- Human Genetics Center, University of Texas Health Science Center, Houston, TX 77030
| | - Dan-Yu Lin
- Department of Biostatistics, University of North Carolina, Chapel Hill, NC 27599
| |
Collapse
|
15
|
Zhang F, Wen Y, Guo X, Zhang Y, Wang X, Yang T, Shen H, Chen X, Tian Q, Deng HW. Brief Report: Genome-Wide Association Study IdentifiesITPR2as a Susceptibility Gene for Kashin-Beck Disease in Han Chinese. Arthritis Rheumatol 2014; 67:176-81. [DOI: 10.1002/art.38898] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2014] [Accepted: 09/25/2014] [Indexed: 11/06/2022]
Affiliation(s)
| | - Yan Wen
- Xi'an Jiaotong University; Xi'an China
| | - Xiong Guo
- Xi'an Jiaotong University; Xi'an China
| | - Yingang Zhang
- First Affiliated Hospital and Xi'an Jiaotong University; Xi'an China
| | - Xi Wang
- Xi'an Jiaotong University; Xi'an China
| | | | - Hui Shen
- Tulane University; New Orleans Louisiana
| | | | - Qing Tian
- Tulane University; New Orleans Louisiana
| | | |
Collapse
|
16
|
Lin DY, Tao R, Kalsbeek W, Zeng D, Gonzalez F, Fernández-Rhodes L, Graff M, Koch G, North K, Heiss G. Genetic association analysis under complex survey sampling: the Hispanic Community Health Study/Study of Latinos. Am J Hum Genet 2014; 95:675-88. [PMID: 25480034 DOI: 10.1016/j.ajhg.2014.11.005] [Citation(s) in RCA: 55] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2014] [Accepted: 11/11/2014] [Indexed: 12/27/2022] Open
Abstract
The cohort design allows investigators to explore the genetic basis of a variety of diseases and traits in a single study while avoiding major weaknesses of the case-control design. Most cohort studies employ multistage cluster sampling with unequal probabilities to conveniently select participants with desired characteristics, and participants from different clusters might be genetically related. Analysis that ignores the complex sampling design can yield biased estimation of the genetic association and inflation of the type I error. Herein, we develop weighted estimators that reflect unequal selection probabilities and differential nonresponse rates, and we derive variance estimators that properly account for the sampling design and the potential relatedness of participants in different sampling units. We compare, both analytically and numerically, the performance of the proposed weighted estimators with unweighted estimators that disregard the sampling design. We demonstrate the usefulness of the proposed methods through analysis of MetaboChip data in the Hispanic Community Health Study/Study of Latinos, which is the largest health study of the Hispanic/Latino population in the United States aimed at identifying risk factors for various diseases and determining the role of genes and environment in the occurrence of diseases. We provide guidelines on the use of weighted and unweighted estimators, as well as the relevant software.
Collapse
|
17
|
Rowe SJ, Karacaören B, de Koning DJ, Lukic B, Hastings-Clark N, Velander I, Haley CS, Archibald AL. Analysis of the genetics of boar taint reveals both single SNPs and regional effects. BMC Genomics 2014; 15:424. [PMID: 24894739 PMCID: PMC4059876 DOI: 10.1186/1471-2164-15-424] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2013] [Accepted: 05/09/2014] [Indexed: 12/23/2022] Open
Abstract
Background Boar taint is an offensive urine or faecal-like odour, affecting the smell and taste of cooked pork from some mature non-castrated male pigs. Androstenone and skatole in fat are the molecules responsible. In most pig production systems, males, which are not required for breeding, are castrated shortly after birth to reduce the risk of boar taint. There is evidence for genetic variation in the predisposition to boar taint. A genome-wide association study (GWAS) was performed to identify loci with effects on boar taint. Five hundred Danish Landrace boars with high levels of skatole in fat (>0.3 μg/g), were each matched with a litter mate with low levels of skatole and measured for androstenone. DNA from these 1,000 non-castrated boars was genotyped using the Illumina PorcineSNP60 Beadchip. After quality control, tests for SNPs associated with boar taint were performed on 938 phenotyped individuals and 44,648 SNPs. Empirical significance thresholds were set by permutation (100,000). For androstenone, a ‘regional heritability approach’ combining information from multiple SNPs was used to estimate the genetic variation attributable to individual autosomes. Results A highly significant association was found between variation in skatole levels and SNPs within the CYP2E1 gene on chromosome 14 (SSC14), which encodes an enzyme involved in degradation of skatole. Nominal significance was found for effects on skatole associated with 4 other SNPs including a region of SSC6 reported previously. Genome-wide significance was found for an association between SNPs on SSC5 and androstenone levels and nominal significance for associations with SNPs on SSC13 and SSC17. The regional analyses confirmed large effects on SSC5 for androstenone and suggest that SSC5 explains 23% of the genetic variation in androstenone. The autosomal heritability analyses also suggest that there is a large effect associated with androstenone on SSC2, not detected using GWAS. Conclusions Significant SNP associations were found for skatole on SSC14 and for androstenone on SSC5 in Landrace pigs. The study agrees with evidence that the CYP2E1 gene has effects on skatole breakdown in the liver. Autosomal heritability estimates can uncover clusters of smaller genetic effects that individually do not exceed the threshold for GWAS significance. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-424) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Suzanne J Rowe
- The Roslin Institute and R(D)SVS, University of Edinburgh, Easter Bush, Midlothian, Scotland EH25 9RG, UK.
| | | | | | | | | | | | | | | |
Collapse
|
18
|
Quantitative trait analysis in sequencing studies under trait-dependent sampling. Proc Natl Acad Sci U S A 2013; 110:12247-52. [PMID: 23847208 DOI: 10.1073/pnas.1221713110] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
It is not economically feasible to sequence all study subjects in a large cohort. A cost-effective strategy is to sequence only the subjects with the extreme values of a quantitative trait. In the National Heart, Lung, and Blood Institute Exome Sequencing Project, subjects with the highest or lowest values of body mass index, LDL, or blood pressure were selected for whole-exome sequencing. Failure to account for such trait-dependent sampling can cause severe inflation of type I error and substantial loss of power in quantitative trait analysis, especially when combining results from multiple studies with different selection criteria. We present valid and efficient statistical methods for association analysis of sequencing data under trait-dependent sampling. We pay special attention to gene-based analysis of rare variants. Our methods can be used to perform quantitative trait analysis not only for the trait that is used to select subjects for sequencing but for any other traits that are measured. For a particular trait of interest, our approach properly combines the association results from all studies with measurements of that trait. This meta-analysis is substantially more powerful than the analysis of any single study. By contrast, meta-analysis of standard linear regression results (ignoring trait-dependent sampling) can be less powerful than the analysis of a single study. The advantages of the proposed methods are demonstrated through simulation studies and the National Heart, Lung, and Blood Institute Exome Sequencing Project data. The methods are applicable to other types of genetic association studies and nongenetic studies.
Collapse
|
19
|
Barnett IJ, Lee S, Lin X. Detecting rare variant effects using extreme phenotype sampling in sequencing association studies. Genet Epidemiol 2012. [PMID: 23184518 DOI: 10.1002/gepi.21699] [Citation(s) in RCA: 107] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
In the increasing number of sequencing studies aimed at identifying rare variants associated with complex traits, the power of the test can be improved by guided sampling procedures. We confirm both analytically and numerically that sampling individuals with extreme phenotypes can enrich the presence of causal rare variants and can therefore lead to an increase in power compared to random sampling. Although application of traditional rare variant association tests to these extreme phenotype samples requires dichotomizing the continuous phenotypes before analysis, the dichotomization procedure can decrease the power by reducing the information in the phenotypes. To avoid this, we propose a novel statistical method based on the optimal Sequence Kernel Association Test that allows us to test for rare variant effects using continuous phenotypes in the analysis of extreme phenotype samples. The increase in power of this method is demonstrated through simulation of a wide range of scenarios as well as in the triglyceride data of the Dallas Heart Study.
Collapse
Affiliation(s)
- Ian J Barnett
- Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115, USA
| | | | | |
Collapse
|
20
|
Zheng G, Jinfeng X, Yuan A, Colin OW. Impact on modes of inheritance and relative risks of using extreme sampling when designing genetic association studies. Ann Hum Genet 2012; 77:80-4. [PMID: 23163532 DOI: 10.1111/j.1469-1809.2012.00733.x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2012] [Accepted: 08/28/2012] [Indexed: 11/29/2022]
Abstract
Using extreme phenotypes for association studies can improve statistical power . We study the impact of using samples with extremely high or low traits on the alternative model space, the genotype relative risks, and the genetic models in association studies. We prove the following results: when the risk allele causes high-trait values, the more extreme the high traits, the larger the genotype relative risks, which is not always true for using extreme low traits; we also prove that a genetic model theoretically changes with more extreme trait except for the recessive or dominant models. Practically, however, the impact of deviations from the true genetic model at a functional locus due to selective sampling is virtually negligible. The implications of our findings are discussed. Numerical values are reported for illustrations.
Collapse
Affiliation(s)
- Gang Zheng
- Office of Biostatistics Research, National Heart, Lung and Blood Institute, Bethesda, MD 20892, USA.
| | | | | | | |
Collapse
|
21
|
Zheng G, Wu CO, Kwak M, Jiang W, Joo J, Lima JAC. Joint analysis of binary and quantitative traits with data sharing and outcome-dependent sampling. Genet Epidemiol 2012; 36:263-73. [PMID: 22460626 DOI: 10.1002/gepi.21619] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2011] [Revised: 12/23/2011] [Accepted: 01/02/2012] [Indexed: 11/07/2022]
Abstract
We study the analysis of a joint association between a genetic marker with both binary (case-control) and quantitative (continuous) traits, where the quantitative trait values are only available for the cases due to data sharing and outcome-dependent sampling. Data sharing becomes common in genetic association studies, and the outcome-dependent sampling is the consequence of data sharing, under which a phenotype of interest is not measured for some subgroup. The trend test (or Pearson's test) and F-test are often, respectively, used to analyze the binary and quantitative traits. Because of the outcome-dependent sampling, the usual F-test can be applied using the subgroup with the observed quantitative traits. We propose a modified F-test by also incorporating the genotype frequencies of the subgroup whose traits are not observed. Further, a combination of this modified F-test and Pearson's test is proposed by Fisher's combination of their P-values as a joint analysis. Because of the correlation of the two analyses, we propose to use a Gamma (scaled chi-squared) distribution to fit the asymptotic null distribution for the joint analysis. The proposed modified F-test and the joint analysis can also be applied to test single trait association (either binary or quantitative trait). Through simulations, we identify the situations under which the proposed tests are more powerful than the existing ones. Application to a real dataset of rheumatoid arthritis is presented.
Collapse
Affiliation(s)
- Gang Zheng
- National Heart, Lung and Blood Institute, 6701 Rockledge Drive, Bethesda, MD 20892, USA.
| | | | | | | | | | | |
Collapse
|
22
|
Casals F, Idaghdour Y, Hussin J, Awadalla P. Next-generation sequencing approaches for genetic mapping of complex diseases. J Neuroimmunol 2012; 248:10-22. [PMID: 22285396 DOI: 10.1016/j.jneuroim.2011.12.017] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2011] [Revised: 11/30/2011] [Accepted: 12/15/2011] [Indexed: 01/12/2023]
Abstract
The advent of next generation sequencing technologies has opened new possibilities in the analysis of human disease. In this review we present the main next-generation sequencing technologies, with their major contributions and possible applications to the study of the genetic etiology of complex diseases.
Collapse
Affiliation(s)
- Ferran Casals
- Centre de Recherche du Centre Hospitalier Universitaire Sainte-Justine, Université de Montréal, Montréal, Québec, Canada.
| | | | | | | |
Collapse
|
23
|
Li D, Lewinger JP, Gauderman WJ, Murcray CE, Conti D. Using extreme phenotype sampling to identify the rare causal variants of quantitative traits in association studies. Genet Epidemiol 2011; 35:790-9. [PMID: 21922541 DOI: 10.1002/gepi.20628] [Citation(s) in RCA: 105] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2011] [Revised: 07/17/2011] [Accepted: 07/22/2011] [Indexed: 12/11/2022]
Abstract
Variants identified in recent genome-wide association studies based on the common-disease common-variant hypothesis are far from fully explaining the hereditability of complex traits. Rare variants may, in part, explain some of the missing hereditability. Here, we explored the advantage of the extreme phenotype sampling in rare-variant analysis and refined this design framework for future large-scale association studies on quantitative traits. We first proposed a power calculation approach for a likelihood-based analysis method. We then used this approach to demonstrate the potential advantages of extreme phenotype sampling for rare variants. Next, we discussed how this design can influence future sequencing-based association studies from a cost-efficiency (with the phenotyping cost included) perspective. Moreover, we discussed the potential of a two-stage design with the extreme sample as the first stage and the remaining nonextreme subjects as the second stage. We demonstrated that this two-stage design is a cost-efficient alternative to the one-stage cross-sectional design or traditional two-stage design. We then discussed the analysis strategies for this extreme two-stage design and proposed a corresponding design optimization procedure. To address many practical concerns, for example measurement error or phenotypic heterogeneity at the very extremes, we examined an approach in which individuals with very extreme phenotypes are discarded. We demonstrated that even with a substantial proportion of these extreme individuals discarded, an extreme-based sampling can still be more efficient. Finally, we expanded the current analysis and design framework to accommodate the CMC approach where multiple rare-variants in the same gene region are analyzed jointly.
Collapse
Affiliation(s)
- Dalin Li
- Medical Genetics Institute, Cedars-Sinai Medical Center, Los Angeles, CA, USA
| | | | | | | | | |
Collapse
|
24
|
Paternoster L, Evans DM, Nohr EA, Holst C, Gaborieau V, Brennan P, Gjesing AP, Grarup N, Witte DR, Jørgensen T, Linneberg A, Lauritzen T, Sandbaek A, Hansen T, Pedersen O, Elliott KS, Kemp JP, St Pourcain B, McMahon G, Zelenika D, Hager J, Lathrop M, Timpson NJ, Smith GD, Sørensen TIA. Genome-wide population-based association study of extremely overweight young adults--the GOYA study. PLoS One 2011; 6:e24303. [PMID: 21935397 PMCID: PMC3174168 DOI: 10.1371/journal.pone.0024303] [Citation(s) in RCA: 84] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2011] [Accepted: 08/08/2011] [Indexed: 12/12/2022] Open
Abstract
Background Thirty-two common variants associated with body mass index (BMI) have been identified in genome-wide association studies, explaining ∼1.45% of BMI variation in general population cohorts. We performed a genome-wide association study in a sample of young adults enriched for extremely overweight individuals. We aimed to identify new loci associated with BMI and to ascertain whether using an extreme sampling design would identify the variants known to be associated with BMI in general populations. Methodology/Principal Findings From two large Danish cohorts we selected all extremely overweight young men and women (n = 2,633), and equal numbers of population-based controls (n = 2,740, drawn randomly from the same populations as the extremes, representing ∼212,000 individuals). We followed up novel (at the time of the study) association signals (p<0.001) from the discovery cohort in a genome-wide study of 5,846 Europeans, before attempting to replicate the most strongly associated 28 SNPs in an independent sample of Danish individuals (n = 20,917) and a population-based cohort of 15-year-old British adolescents (n = 2,418). Our discovery analysis identified SNPs at three loci known to be associated with BMI with genome-wide confidence (P<5×10−8; FTO, MC4R and FAIM2). We also found strong evidence of association at the known TMEM18, GNPDA2, SEC16B, TFAP2B, SH2B1 and KCTD15 loci (p<0.001), and nominal association (p<0.05) at a further 8 loci known to be associated with BMI. However, meta-analyses of our discovery and replication cohorts identified no novel associations. Significance Our results indicate that the detectable genetic variation associated with extreme overweight is very similar to that previously found for general BMI. This suggests that population-based study designs with enriched sampling of individuals with the extreme phenotype may be an efficient method for identifying common variants that influence quantitative traits and a valid alternative to genotyping all individuals in large population-based studies, which may require tens of thousands of subjects to achieve similar power.
Collapse
|
25
|
A simple bias correction in linear regression for quantitative trait association under two-tail extreme selection. Behav Genet 2011; 41:776-9. [PMID: 21626281 PMCID: PMC3162965 DOI: 10.1007/s10519-011-9475-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2011] [Accepted: 05/16/2011] [Indexed: 11/17/2022]
Abstract
Selective genotyping can increase power in quantitative trait association. One example of selective genotyping is two-tail extreme selection, but simple linear regression analysis gives a biased genetic effect estimate. Here, we present a simple correction for the bias.
Collapse
|
26
|
Tang Y. Equivalence of three score tests for association mapping of quantitative trait loci under selective genotyping. Genet Epidemiol 2010; 34:522-7. [PMID: 20552655 DOI: 10.1002/gepi.20498] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Huang and Lin ([2007] Am J Hum Genet 80:567-572) proposed a conditional-likelihood approach for mapping quantitative trait loci (QTL) under selective genotyping, and demonstrated via simulation that their model tends to be more powerful than the prospective linear regression. However, we show that the three score tests based on the conditional, prospective and retrospective likelihoods are numerically identical in testing association between a quantitative trait and a candidate locus. Two approximations are derived for calculating power and sample size for the score test. Compared to the random sampling, a single-tail selection generally reduces the power of the score test in mapping small effect QTLs. A two-tail selection generally enhances the QTL heritability; however, in small samples, the power of the test may actually decrease if the sample sizes are highly unbalanced in the upper and lower tails of the trait distribution.
Collapse
|
27
|
Fontanesi L, Speroni C, Buttazzoni L, Scotti E, Costa LN, Davoli R, Russo V. Association between cathepsin L (CTSL) and cathepsin S (CTSS) polymorphisms and meat production and carcass traits in Italian Large White pigs. Meat Sci 2010; 85:331-8. [DOI: 10.1016/j.meatsci.2010.01.023] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2009] [Revised: 01/25/2010] [Accepted: 01/29/2010] [Indexed: 10/19/2022]
|
28
|
Xing C, Xing G. Power of selective genotyping in genome-wide association studies of quantitative traits. BMC Proc 2009; 3 Suppl 7:S23. [PMID: 20018013 PMCID: PMC2795920 DOI: 10.1186/1753-6561-3-s7-s23] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
The selective genotyping approach in quantitative genetics means genotyping only individuals with extreme phenotypes. This approach is considered an efficient way to perform gene mapping, and can be applied in both linkage and association studies. Selective genotyping in association mapping of quantitative trait loci was proposed to increase the power of detecting rare alleles of large effect. However, using this approach, only common variants have been detected. Studies on selective genotyping have been limited to single-locus scenarios. In this study we aim to investigate the power of selective genotyping in a genome-wide association study scenario, and we specifically study the impact of minor allele frequency of variants on the power of this approach. We use the Genetic Analysis Workshop 16 rheumatoid arthritis whole-genome data from the North American Rheumatoid Arthritis Consortium. Two quantitative traits, anti-cyclic citrullinated peptide and rheumatoid factor immunoglobulin M, and one binary trait, rheumatoid arthritis affection status, are used in the analysis. The power of selective genotyping is explored as a function of three parameters: sampling proportion, minor allele frequency of single-nucleotide polymorphism, and test level. The results show that the selective genotyping approach is more efficient in detecting common variants than detecting rare variants, and it is efficient only when the level of declaring significance is not stringent. In summary, the selective genotyping approach is most suitable for detecting common variants in candidate gene-based studies.
Collapse
Affiliation(s)
- Chao Xing
- Department of Clinical Sciences, University of Texas Southwestern Medical Center, Dallas, Texas 75390, USA.
| | | |
Collapse
|
29
|
Abstract
Selective genotyping and phenotyping strategies are used to lower the cost of quantitative trait locus studies. Their efficiency has been studied primarily in simplified contexts--when a single locus contributes to the phenotype, and when the residual error (phenotype conditional on the genotype) is normally distributed. It is unclear how these strategies will perform in the context of complex traits where multiple loci, possibly linked or epistatic, may contribute to the trait. We also do not know what genotyping strategies should be used for nonnormally distributed phenotypes. For time-to-event phenotypes there is the additional question of choosing follow-up time duration. We use an information perspective to examine these experimental design issues in the broader context of complex traits and make recommendations on their use.
Collapse
|
30
|
Liu C, Yang Q, Adrienne Cupples L, Meigs JB, Dupuis J. Selection of the most informative individuals from families with multiple siblings for association studies. Genet Epidemiol 2008; 33:299-307. [PMID: 19025786 DOI: 10.1002/gepi.20380] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Association analyses may follow an initial linkage analysis for mapping and identifying genes underlying complex quantitative traits and may be conducted on unrelated subsets of individuals where only one member of a family is included. We evaluate two methods to select one sibling per sibship when multiple siblings are available: (1) one sibling with the most extreme trait value; and (2) one sibling using a combination score statistic based on extreme trait values and identity-by-descent sharing information. We compare the type I error and power. Furthermore, we compare these selection strategies with a strategy that randomly selects one sibling per sibship and with an approach that includes all siblings, using both simulation study and an application to fasting blood glucose in the Framingham Heart Study. When genetic effect is homogeneous, we find that using the combination score can increase power by 30-40% compared to a random selection strategy, and loses only 8-13% of power compared to the full sibship analysis, across all additive models considered, but offers at least 50% genotyping cost saving. In the presence of genetic heterogeneity, the score offers a 50% increase in power over a random selection strategy, but there is substantial loss compared to the full sibship analysis. In application to fasting blood sample, two SNPs are found in common for the selection strategies and the full sample among the 10 highest ranked single nucleotide polymorphisms. The EV strategy tends to agree with the IBD-EV strategy and the analysis of the full sample.
Collapse
Affiliation(s)
- Chunyu Liu
- Genetics and Genomics, Biogen Idec, Cambridge, Massachusetts, USA.
| | | | | | | | | |
Collapse
|
31
|
Genetic analyses in a sample of individuals with high or low BMD shows association with multiple Wnt pathway genes. J Bone Miner Res 2008; 23:499-506. [PMID: 18021006 DOI: 10.1359/jbmr.071113] [Citation(s) in RCA: 122] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
UNLABELLED Using a moderate-sized cohort selected with extreme BMD (n = 344; absolute value BMD, 1.5-4.0), significant association of several members of the Wnt signaling pathway with bone densitometry measures was shown. This confirms that extreme truncate selection is a powerful design for quantitative trait association studies of bone phenotypes. INTRODUCTION Although the high heritability of BMD variation has long been established, few genes have been conclusively shown to affect the variation of BMD in the general population. Extreme truncate selection has been proposed as a more powerful alternative to unselected cohort designs in quantitative trait association studies. We sought to test these theoretical predictions in studies of the bone densitometry measures BMD, BMC, and femoral neck area, by investigating their association with members of the Wnt pathway, some of which have previously been shown to be associated with BMD in much larger cohorts, in a moderate-sized extreme truncate selected cohort (absolute value BMD Z-scores = 1.5-4.0; n = 344). MATERIALS AND METHODS Ninety-six tag-single nucleotide polymorphism (SNPs) lying in 13 Wnt signaling pathway genes were selected to tag common genetic variation (minor allele frequency [MAF] > 5% with an r(2) > 0.8) within 5 kb of all exons of 13 Wnt signaling pathway genes. The genes studied included LRP1, LRP5, LRP6, Wnt3a, Wnt7b, Wnt10b, SFRP1, SFRP2, DKK1, DKK2, FZD7, WISP3, and SOST. Three hundred forty-four cases with either high or low BMD were genotyped by Illumina Goldengate microarray SNP genotyping methods. Association was tested either by Cochrane-Armitage test for dichotomous variables or by linear regression for quantitative traits. RESULTS Strong association was shown with LRP5, polymorphisms of which have previously been shown to influence total hip BMD (minimum p = 0.0006). In addition, polymorphisms of the Wnt antagonist, SFRP1, were significantly associated with BMD and BMC (minimum p = 0.00042). Previously reported associations of LRP1, LRP6, and SOST with BMD were confirmed. Two other Wnt pathway genes, Wnt3a and DKK2, also showed nominal association with BMD. CONCLUSIONS This study shows that polymorphisms of multiple members of the Wnt pathway are associated with BMD variation. Furthermore, this study shows in a practical trial that study designs involving extreme truncate selection and moderate sample sizes can robustly identify genes of relevant effect sizes involved in BMD variation in the general population. This has implications for the design of future genome-wide studies of quantitative bone phenotypes relevant to osteoporosis.
Collapse
|
32
|
Li YM, Xiang Y, Sun ZQ. An entropy-based measure for QTL mapping using extreme samples of population. Hum Hered 2007; 65:121-8. [PMID: 17934315 DOI: 10.1159/000109729] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2007] [Accepted: 06/13/2007] [Indexed: 11/19/2022] Open
Abstract
Quantitative trait locus (QTL) mapping can be accomplished through the method of selective genotyping, which is based on the differences of frequencies between an upper sample and a lower sample in population. However, amplifying the differences in marker allele frequencies in extreme samples may increase the probability for QTL mapping. Shannon entropy, which is a nonlinear function of allele frequencies, can be used to amplify the differences in marker allele frequencies. In this paper, we present a novel measure for linkage disequilibrium (LD) between a marker and single QTL, that is based on the comparison of the entropy and conditional entropy in a marker in extreme samples of population. This measure of LD between the marker and the trait locus can be used when the marker allele frequencies are known in the extreme samples of a population. We investigate the mapping performance in both analytic and simulation scenarios of a single QTL linked to a single marker. Our results show that the measure has very reasonable performance. In addition, a simulation study is performed on the basis of the haplotype frequencies of 10 SNPs of angiotensin-I converting enzyme (ACE) genes.
Collapse
Affiliation(s)
- Yu-Mei Li
- School of Public Health, Central South University, Changsha, PR China.
| | | | | |
Collapse
|
33
|
Knowles JW, Assimes TL, Li J, Quertermous T, Cooke JP. Genetic susceptibility to peripheral arterial disease: a dark corner in vascular biology. Arterioscler Thromb Vasc Biol 2007; 27:2068-78. [PMID: 17656669 PMCID: PMC4321902 DOI: 10.1161/01.atv.0000282199.66398.8c] [Citation(s) in RCA: 57] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Peripheral arterial disease (PAD) is characterized by reduced blood flow to the limbs, usually as a consequence of atherosclerosis, and affects approximately 12 million Americans. It is a common cause of cardiovascular morbidity and an independent predictor of cardiovascular mortality. Similar to other atherosclerotic diseases, such as coronary artery disease, PAD is the result of the complex interplay between injurious environmental stimuli and genetic predisposing factors of the host. Genetic susceptibility to PAD is likely contributed by sequence variants in multiple genes, each with modest effects. Although many of these variants probably alter susceptibility both to PAD and to coronary artery disease, it is likely that there exists a set of variants specifically to alter susceptibility to PAD. Despite the prevalence of PAD and its high societal burden, relatively little is known about such genetic variants. This review summarizes our limited present knowledge and gives an overview of recent, more powerful approaches to elucidating the genetic basis of PAD. We discuss the advantages and limitations of genetic studies and highlight the need for collaborative networks of PAD investigators for shedding light on this dark corner of vascular biology.
Collapse
Affiliation(s)
- Joshua W Knowles
- Falk Cardiovascular Research Building, Division of Cardiovascular Medicine, Stanford University School of Medicine, Stanford, CA, 94305-5406, USA.
| | | | | | | | | |
Collapse
|
34
|
Huang BE, Lin DY. Efficient association mapping of quantitative trait loci with selective genotyping. Am J Hum Genet 2007; 80:567-76. [PMID: 17273979 PMCID: PMC1821103 DOI: 10.1086/512727] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2006] [Accepted: 01/09/2007] [Indexed: 11/03/2022] Open
Abstract
Selective genotyping (i.e., genotyping only those individuals with extreme phenotypes) can greatly improve the power to detect and map quantitative trait loci in genetic association studies. Because selection depends on the phenotype, the resulting data cannot be properly analyzed by standard statistical methods. We provide appropriate likelihoods for assessing the effects of genotypes and haplotypes on quantitative traits under selective-genotyping designs. We demonstrate that the likelihood-based methods are highly effective in identifying causal variants and are substantially more powerful than existing methods.
Collapse
Affiliation(s)
- B E Huang
- Department of Biostatistics, University of North Carolina, Chapel Hill 27599-7420, USA
| | | |
Collapse
|
35
|
Zheng G, Ghosh K, Chen Z, Li Z. Extreme Rank Selections for Linkage Analysis of Quantitative Trait Loci Using Selected Sib-Pairs. Ann Hum Genet 2006; 70:857-66. [PMID: 17044861 DOI: 10.1111/j.1469-1809.2006.00268.x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
It is well known that linkage analysis using simple random sib-pairs has relatively low power for detecting quantitative trait loci with small genetic effects. The power can be substantially increased by using samples selected based on their trait values. Usually, samples that are obtained by truncation selection consist of random samples from a truncated trait distribution. In this article we propose an alternative method using extreme ranks for linkage analysis with selected sib-pairs. This approach approximates the truncation selection. With similar screening sizes and the same sample size of selected sib-pairs, the extreme rank selection and truncation method have similar power performance, both of which are substantially more powerful than when using random sib-pairs. Simulation results on the comparison of powers between the truncation selection and the extreme rank selection and/or random selection for linkage analysis are reported.
Collapse
Affiliation(s)
- G Zheng
- Office of Biostatistics Research, National Heart, Lung and Blood Institute, 6701 Rockledge Drive, Bethesda, MD 20892, USA.
| | | | | | | |
Collapse
|
36
|
Wallace C, Chapman JM, Clayton DG. Improved power offered by a score test for linkage disequilibrium mapping of quantitative-trait loci by selective genotyping. Am J Hum Genet 2006; 78:498-504. [PMID: 16465623 PMCID: PMC1380292 DOI: 10.1086/500562] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2005] [Accepted: 12/16/2005] [Indexed: 11/03/2022] Open
Abstract
Selective genotyping is used to increase efficiency in genetic association studies of quantitative traits by genotyping only those individuals who deviate from the population mean. However, selection distorts the conditional distribution of the trait given genotype, and such data sets are usually analyzed using case-control methods, quantitative analysis within selected groups, or a combination of both. We show that Hotelling's T(2) test, recently proposed for association studies of one or several tagging single-nucleotide polymorphisms in a prospective (i.e., trait given genotype) design, can also be applied to the retrospective (i.e., genotype given trait) selective-genotyping design, and we use simulation to demonstrate its improved power over existing methods.
Collapse
Affiliation(s)
- Chris Wallace
- Department of Clinical Pharmacology, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, United Kingdom.
| | | | | |
Collapse
|