1
|
Cherukuri PF, Soe MM, Condon DE, Bartaria S, Meis K, Gu S, Frost FG, Fricke LM, Lubieniecki KP, Lubieniecka JM, Pyatt RE, Hajek C, Boerkoel CF, Carmichael L. Establishing analytical validity of BeadChip array genotype data by comparison to whole-genome sequence and standard benchmark datasets. BMC Med Genomics 2022; 15:56. [PMID: 35287663 PMCID: PMC8919546 DOI: 10.1186/s12920-022-01199-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2021] [Accepted: 02/28/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Clinical use of genotype data requires high positive predictive value (PPV) and thorough understanding of the genotyping platform characteristics. BeadChip arrays, such as the Global Screening Array (GSA), potentially offer a high-throughput, low-cost clinical screen for known variants. We hypothesize that quality assessment and comparison to whole-genome sequence and benchmark data establish the analytical validity of GSA genotyping. METHODS To test this hypothesis, we selected 263 samples from Coriell, generated GSA genotypes in triplicate, generated whole genome sequence (rWGS) genotypes, assessed the quality of each set of genotypes, and compared each set of genotypes to each other and to the 1000 Genomes Phase 3 (1KG) genotypes, a performance benchmark. For 59 genes (MAP59), we also performed theoretical and empirical evaluation of variants deemed medically actionable predispositions. RESULTS Quality analyses detected sample contamination and increased assay failure along the chip margins. Comparison to benchmark data demonstrated that > 82% of the GSA assays had a PPV of 1. GSA assays targeting transitions, genomic regions of high complexity, and common variants performed better than those targeting transversions, regions of low complexity, and rare variants. Comparison of GSA data to rWGS and 1KG data showed > 99% performance across all measured parameters. Consistent with predictions from prior studies, the GSA detection of variation within the MAP59 genes was 3/261. CONCLUSION We establish the analytical validity of GSA assays using quality analytics and comparison to benchmark and rWGS data. GSA assays meet the standards of a clinical screen although assays interrogating rare variants, transversions, and variants within low-complexity regions require careful evaluation.
Collapse
Affiliation(s)
- Praveen F Cherukuri
- Imagenetics, Sanford Health, 1410 W 25th St. Room #302, Sioux Falls, SD, 57105, USA. .,Sanford School of Medicine, University of South Dakota, Sioux Falls, SD, USA. .,Sanford Research Center, Sioux Falls, SD, USA.
| | - Melissa M Soe
- Imagenetics, Sanford Health, 1410 W 25th St. Room #302, Sioux Falls, SD, 57105, USA
| | - David E Condon
- Imagenetics, Sanford Health, 1410 W 25th St. Room #302, Sioux Falls, SD, 57105, USA.,Sanford School of Medicine, University of South Dakota, Sioux Falls, SD, USA
| | - Shubhi Bartaria
- Imagenetics, Sanford Health, 1410 W 25th St. Room #302, Sioux Falls, SD, 57105, USA
| | - Kaitlynn Meis
- Imagenetics, Sanford Health, 1410 W 25th St. Room #302, Sioux Falls, SD, 57105, USA
| | - Shaopeng Gu
- Imagenetics, Sanford Health, 1410 W 25th St. Room #302, Sioux Falls, SD, 57105, USA
| | - Frederick G Frost
- Imagenetics, Sanford Health, 1410 W 25th St. Room #302, Sioux Falls, SD, 57105, USA
| | - Lindsay M Fricke
- Imagenetics, Sanford Health, 1410 W 25th St. Room #302, Sioux Falls, SD, 57105, USA
| | - Krzysztof P Lubieniecki
- Imagenetics, Sanford Health, 1410 W 25th St. Room #302, Sioux Falls, SD, 57105, USA.,Sanford School of Medicine, University of South Dakota, Sioux Falls, SD, USA.,Sanford Research Center, Sioux Falls, SD, USA
| | - Joanna M Lubieniecka
- Imagenetics, Sanford Health, 1410 W 25th St. Room #302, Sioux Falls, SD, 57105, USA.,Sanford School of Medicine, University of South Dakota, Sioux Falls, SD, USA.,Sanford Research Center, Sioux Falls, SD, USA
| | - Robert E Pyatt
- Imagenetics, Sanford Health, 1410 W 25th St. Room #302, Sioux Falls, SD, 57105, USA.,Sanford School of Medicine, University of South Dakota, Sioux Falls, SD, USA
| | - Catherine Hajek
- Imagenetics, Sanford Health, 1410 W 25th St. Room #302, Sioux Falls, SD, 57105, USA.,Sanford School of Medicine, University of South Dakota, Sioux Falls, SD, USA
| | - Cornelius F Boerkoel
- Imagenetics, Sanford Health, 1410 W 25th St. Room #302, Sioux Falls, SD, 57105, USA
| | - Lynn Carmichael
- Imagenetics, Sanford Health, 1410 W 25th St. Room #302, Sioux Falls, SD, 57105, USA
| |
Collapse
|
2
|
Calingacion M, Mumm R, Tan K, Quiatchon-Baeza L, Concepcion JCT, Hageman JA, Prakash S, Fitzgerald M, Hall RD. A Multidisciplinary Phenotyping and Genotyping Analysis of a Mapping Population Enables Quality to Be Combined with Yield in Rice. Front Mol Biosci 2017; 4:32. [PMID: 28589124 PMCID: PMC5438996 DOI: 10.3389/fmolb.2017.00032] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2017] [Accepted: 05/04/2017] [Indexed: 12/21/2022] Open
Abstract
In this study a mapping population (F8) of ca 200 progeny from a cross between the commercial rice varieties Apo and IR64 has been both genotyped and phenotyped. A genotyping-by-sequencing approach was first used to identify 2,681 polymorphic SNP markers which gave dense coverage of the genome with a good distribution across all 12 chromosomes. The coefficient of parentage was also low, at 0.13, confirming that the parents are genetically distant from each other. The progeny, together with both parents, were grown under irrigated and water restricted conditions in a randomised block design. All grain was harvested to determine variation in yield across the population. The grains were then polished following standard procedures prior to performing the phenotyping analyses. A Gas Chromatography—Mass Spectrometry approach was used to determine the volatile biochemical profiles of each line and after data curation and processing, discriminatory metabolites were putatively identified based on in-house and commercial spectral libraries. These data were used to predict the potential role of these metabolites in determining differences in aroma between genotypes. A number of QTLs for yield and for individual metabolites have been identified. Following these combined multi-disciplinary analyses, it proved possible to identify a number of lines which appeared to combine the favourable aroma attributes of IR64 with the favourable (higher) yield potential of Apo. As such, these lines are excellent candidates to assess further as potential genotypes to work up into a new variety of rice which has both good yield and good quality, thus meeting the needs of both farmer and consumer alike.
Collapse
Affiliation(s)
- Mariafe Calingacion
- Grain Quality and Nutrition Centre, International Rice Research InstituteLaguna, Philippines.,Laboratory of Plant Physiology, Wageningen University and ResearchWageningen, Netherlands
| | - Roland Mumm
- Wageningen Plant Research, Wageningen University and ResearchWageningen, Netherlands.,Netherlands Metabolomics CentreLeiden, Netherlands
| | - Kevin Tan
- Department of Food Science and Technology, School of Agriculture and Food Sciences, University of QueenslandBrisbane, QLD, Australia
| | - Lenie Quiatchon-Baeza
- Grain Quality and Nutrition Centre, International Rice Research InstituteLaguna, Philippines
| | - Jeanaflor C T Concepcion
- Department of Food Science and Technology, School of Agriculture and Food Sciences, University of QueenslandBrisbane, QLD, Australia
| | - Jos A Hageman
- Biometris, Wageningen University and ResearchWageningen, Netherlands
| | - Sangeeta Prakash
- Department of Food Science and Technology, School of Agriculture and Food Sciences, University of QueenslandBrisbane, QLD, Australia
| | - Melissa Fitzgerald
- Department of Food Science and Technology, School of Agriculture and Food Sciences, University of QueenslandBrisbane, QLD, Australia
| | - Robert D Hall
- Laboratory of Plant Physiology, Wageningen University and ResearchWageningen, Netherlands.,Wageningen Plant Research, Wageningen University and ResearchWageningen, Netherlands.,Netherlands Metabolomics CentreLeiden, Netherlands
| |
Collapse
|
3
|
Brand B, Scheinhardt MO, Friedrich J, Zimmer D, Reinsch N, Ponsuksili S, Schwerin M, Ziegler A. Adrenal cortex expression quantitative trait loci in a German Holstein × Charolais cross. BMC Genet 2016; 17:135. [PMID: 27716033 PMCID: PMC5053117 DOI: 10.1186/s12863-016-0442-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2016] [Accepted: 09/28/2016] [Indexed: 12/30/2022] Open
Abstract
Background The importance of the adrenal gland in regard to lactation and reproduction in cattle has been recognized early. Caused by interest in animal welfare and the impact of stress on economically important traits in farm animals the adrenal gland and its function within the stress response is of increasing interest. However, the molecular mechanisms and pathways involved in stress-related effects on economically important traits in farm animals are not fully understood. Gene expression is an important mechanism underlying complex traits, and genetic variants affecting the transcript abundance are thought to influence the manifestation of an expressed phenotype. We therefore investigated the genetic background of adrenocortical gene expression by applying an adaptive linear rank test to identify genome-wide expression quantitative trait loci (eQTL) for adrenal cortex transcripts in cattle. Results A total of 10,986 adrenal cortex transcripts and 37,204 single nucleotide polymorphisms (SNPs) were analysed in 145 F2 cows of a Charolais × German Holstein cross. We identified 505 SNPs that were associated with the abundance of 129 transcripts, comprising 482 cis effects and 17 trans effects. These SNPs were located on all chromosomes but X, 16, 24 and 28. Associated genes are mainly involved in molecular and cellular functions comprising free radical scavenging, cellular compromise, cell morphology and lipid metabolism, including genes such as CYP27A1 and LHCGR that have been shown to affect economically important traits in cattle. Conclusions In this study we showed that adrenocortical eQTL affect the expression of genes known to contribute to the phenotypic manifestation in cattle. Furthermore, some of the identified genes and related molecular pathways were previously shown to contribute to the phenotypic variation of behaviour, temperament and growth at the onset of puberty in the same population investigated here. We conclude that eQTL analysis appears to be a useful approach providing insight into the molecular and genetic background of complex traits in cattle and will help to understand molecular networks involved. Electronic supplementary material The online version of this article (doi:10.1186/s12863-016-0442-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Bodo Brand
- Institute for Genome Biology, Leibniz Institute for Farm Animal Biology, Wilhelm-Stahl-Allee, Dummerstorf, Germany.,Current affiliation: Institute for Farm Animal Research and Technology, University of Rostock, Justus-von-Liebig-Weg, 18059, Rostock, Germany
| | - Markus O Scheinhardt
- Institute of Medical Biometry and Statistics, University of Lübeck, Ratzeburger Allee, Lübeck, Germany
| | - Juliane Friedrich
- Institute for Farm Animal Research and Technology, University of Rostock, Justus-von-Liebig-Weg, Rostock, Germany
| | - Daisy Zimmer
- Institute for Farm Animal Research and Technology, University of Rostock, Justus-von-Liebig-Weg, Rostock, Germany
| | - Norbert Reinsch
- Institute for Genetics and Biometry, Leibniz Institute for Farm Animal Biology, Wilhelm-Stahl-Allee, Dummerstorf, Germany
| | - Siriluck Ponsuksili
- Institute for Genome Biology, Leibniz Institute for Farm Animal Biology, Wilhelm-Stahl-Allee, Dummerstorf, Germany
| | - Manfred Schwerin
- Institute for Genome Biology, Leibniz Institute for Farm Animal Biology, Wilhelm-Stahl-Allee, Dummerstorf, Germany.,Institute for Farm Animal Research and Technology, University of Rostock, Justus-von-Liebig-Weg, Rostock, Germany
| | - Andreas Ziegler
- Institute of Medical Biometry and Statistics, University of Lübeck, Ratzeburger Allee, Lübeck, Germany. .,Center for Clinical Trials, University of Lübeck, Ratzeburger Allee, Lübeck, Germany. .,School of Mathematics, Statistics and Computer Science, University of KwaZulu-Natal, Pietermaritzburg, South Africa.
| |
Collapse
|
4
|
Utsunomiya ATH, Santos DJA, Boison SA, Utsunomiya YT, Milanesi M, Bickhart DM, Ajmone-Marsan P, Sölkner J, Garcia JF, da Fonseca R, da Silva MVGB. Revealing misassembled segments in the bovine reference genome by high resolution linkage disequilibrium scan. BMC Genomics 2016; 17:705. [PMID: 27595709 PMCID: PMC5011828 DOI: 10.1186/s12864-016-3049-8] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2016] [Accepted: 08/27/2016] [Indexed: 11/21/2022] Open
Abstract
Background Misassembly signatures, created by shuffling the order of sequences while assembling a genome, can be detected by the unexpected behavior of marker linkage disequilibrium (LD) decay. We developed a heuristic process to identify misassembly signatures, applied it to the bovine reference genome assembly (UMDv3.1) and presented the consequences of misassemblies in two case studies. Results We identified 2,906 single nucleotide polymorphism (SNP) markers presenting unexpected LD decay behavior in 626 putative misassembled contigs, which comprised less than 1 % of the whole genome. Although this represents a small fraction of the reference sequence, these poorly assembled segments can lead to severe implications to local genome context. For instance, we showed that one of the misassembled regions mapped to the POLL locus, which affected the annotation of positional candidate genes in a GWAS case study for polledness in Nellore (Bos indicus beef cattle). Additionally, we found that poorly performing markers in imputation mapped to putative misassembled regions, and that correction of marker positions based on LD was capable to recover imputation accuracy. Conclusions This heuristic approach can be useful to cross validate reference assemblies and to filter out markers located at low confidence genomic regions before conducting downstream analyses. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-3049-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Adam T H Utsunomiya
- Faculdade de Ciências Agrárias e Veterinárias, Universidade Estadual Paulista - UNESP, Campus de Jaboticabal, São Paulo, Brasil.
| | - Daniel J A Santos
- Faculdade de Ciências Agrárias e Veterinárias, Universidade Estadual Paulista - UNESP, Campus de Jaboticabal, São Paulo, Brasil
| | | | - Yuri T Utsunomiya
- Faculdade de Ciências Agrárias e Veterinárias, Universidade Estadual Paulista - UNESP, Campus de Jaboticabal, São Paulo, Brasil
| | - Marco Milanesi
- Faculdade de Medicina Veterinária de Araçatuba, Universidade Estadual Paulista - UNESP, Campus de Araçatuba, São Paulo, Brasil
| | - Derek M Bickhart
- Animal Genomics and Improvement Laboratory, ARS, USDA, Beltsville, MD, USA
| | - Paolo Ajmone-Marsan
- Institute of Zootechnics and Biodiversity and Ancient DNA Research Center, Università Cattolica del Sacro Cuore, Piacenza, Italy.,Nutrigenomics and Proteomics Research Center - PRONUTRIGEN, Università Cattolica del Sacro Cuore, Piacenza, Italy
| | - Johann Sölkner
- Department of Sustainable Agricultural Systems, Division of Livestock Sciences, BOKU - University of Natural Resources and Life Sciences, Vienna, Austria
| | - José F Garcia
- Faculdade de Medicina Veterinária de Araçatuba, Universidade Estadual Paulista - UNESP, Campus de Araçatuba, São Paulo, Brasil.,International Atomic Energy Agency (IAEA) Collaborating Centre on Animal Genomics and Bioinformatics, Araçatuba, São Paulo, Brasil
| | - Ricardo da Fonseca
- Faculdade de Ciências Agrárias e Veterinárias, Universidade Estadual Paulista - UNESP, Campus de Jaboticabal, São Paulo, Brasil.,Faculdade de Ciências Agrárias e Tecnológicas, Universidade Estadual Paulista - UNESP, Campus de Dracena, São Paulo, Brasil
| | | |
Collapse
|
5
|
Alyass A, Turcotte M, Meyre D. From big data analysis to personalized medicine for all: challenges and opportunities. BMC Med Genomics 2015; 8:33. [PMID: 26112054 PMCID: PMC4482045 DOI: 10.1186/s12920-015-0108-y] [Citation(s) in RCA: 240] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2015] [Accepted: 06/15/2015] [Indexed: 02/07/2023] Open
Abstract
Recent advances in high-throughput technologies have led to the emergence of systems biology as a holistic science to achieve more precise modeling of complex diseases. Many predict the emergence of personalized medicine in the near future. We are, however, moving from two-tiered health systems to a two-tiered personalized medicine. Omics facilities are restricted to affluent regions, and personalized medicine is likely to widen the growing gap in health systems between high and low-income countries. This is mirrored by an increasing lag between our ability to generate and analyze big data. Several bottlenecks slow-down the transition from conventional to personalized medicine: generation of cost-effective high-throughput data; hybrid education and multidisciplinary teams; data storage and processing; data integration and interpretation; and individual and global economic relevance. This review provides an update of important developments in the analysis of big data and forward strategies to accelerate the global transition to personalized medicine.
Collapse
Affiliation(s)
- Akram Alyass
- Department of Clinical Epidemiology and Biostatistics, McMaster University, 1280 Main Street West, Hamilton, ON, Canada.
| | - Michelle Turcotte
- Department of Clinical Epidemiology and Biostatistics, McMaster University, 1280 Main Street West, Hamilton, ON, Canada.
| | - David Meyre
- Department of Clinical Epidemiology and Biostatistics, McMaster University, 1280 Main Street West, Hamilton, ON, Canada.
- Department of Pathology and Molecular Medicine, McMaster University, 1280 Main Street West, Hamilton, ON, Canada.
| |
Collapse
|
6
|
Mészáros G, Petautschnig E, Schwarzenbacher H, Sölkner J. Genomic regions influencing coat color saturation and facial markings in Fleckvieh cattle. Anim Genet 2014; 46:65-8. [PMID: 25515556 DOI: 10.1111/age.12249] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/26/2014] [Indexed: 02/01/2023]
Abstract
Genomic regions associated with coat color and pigmented areas of the head were identified for Fleckvieh (dual-purpose Simmental), a red-spotted and white-headed cattle breed. Coat color was measured with a chromameter, implementing the CIELAB color space and resulting in numerical representation of lightness, color intensity, red/green and blue/yellow color components, rather than subjective classification. Single marker regression analyses with fixed effects of the sex and barn were applied, and significant regions were determined with the local false discovery rate methodology. The PMEL and ERBB3 genes on chromosome 5 were in the most significant region for the color measurements. In addition to the blue/yellow color component and color intensity, the AP3B2 gene on chromosome 21 was identified. Its function was confirmed for similar traits in a range of model species. The KIT gene on chromosome 6 was found to be strongly associated with the inhibition of circum-ocular pigmentation and pigmented spots on the cheek.
Collapse
Affiliation(s)
- Gábor Mészáros
- Division of Livestock Sciences, University of Natural Resources and Life Sciences, A-1180, Vienna, Austria
| | | | | | | |
Collapse
|
7
|
Abstract
The gene order on the X chromosome of eutherians is generally highly conserved, although an increase in the rate of rearrangement has been reported in the rodent lineage. Conservation of the X chromosome is thought to be caused by selection related to maintenance of dosage compensation. However, we herein reveal that the cattle (Btau4.0) lineage has experienced a strong increase in the rate of X-chromosome rearrangement, much stronger than that previously reported for rodents. We also show that this increase is not matched by a similar increase on the autosomes and cannot be explained by assembly errors. Furthermore, we compared the difference in two cattle genome assemblies: Btau4.0 and Btau6.0 (Bos taurus UMD3.1). The results showed a discrepancy between Btau4.0 and Btau6.0 cattle assembly version data, and we believe that Btau6.0 cattle assembly version data are not more reliable than Btau4.0. [BMB Reports 2013; 46(6): 310-315]
Collapse
Affiliation(s)
- Woncheoul Park
- Department of Agricultural Biotechnology and Research Institute for Agriculture and Life Sciences, Seoul National University, Seoul 151-742, Korea
| | | | | |
Collapse
|
8
|
Manconi A, Orro A, Manca E, Armano G, Milanesi L. A tool for mapping Single Nucleotide Polymorphisms using Graphics Processing Units. BMC Bioinformatics 2014; 15 Suppl 1:S10. [PMID: 24564714 PMCID: PMC4015528 DOI: 10.1186/1471-2105-15-s1-s10] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/23/2023] Open
Abstract
Background Single Nucleotide Polymorphism (SNP) genotyping analysis is very susceptible to SNPs chromosomal position errors. As it is known, SNPs mapping data are provided along the SNP arrays without any necessary information to assess in advance their accuracy. Moreover, these mapping data are related to a given build of a genome and need to be updated when a new build is available. As a consequence, researchers often plan to remap SNPs with the aim to obtain more up-to-date SNPs chromosomal positions. In this work, we present G-SNPM a GPU (Graphics Processing Unit) based tool to map SNPs on a genome. Methods G-SNPM is a tool that maps a short sequence representative of a SNP against a reference DNA sequence in order to find the physical position of the SNP in that sequence. In G-SNPM each SNP is mapped on its related chromosome by means of an automatic three-stage pipeline. In the first stage, G-SNPM uses the GPU-based short-read mapping tool SOAP3-dp to parallel align on a reference chromosome its related sequences representative of a SNP. In the second stage G-SNPM uses another short-read mapping tool to remap the sequences unaligned or ambiguously aligned by SOAP3-dp (in this stage SHRiMP2 is used, which exploits specialized vector computing hardware to speed-up the dynamic programming algorithm of Smith-Waterman). In the last stage, G-SNPM analyzes the alignments obtained by SOAP3-dp and SHRiMP2 to identify the absolute position of each SNP. Results and conclusions To assess G-SNPM, we used it to remap the SNPs of some commercial chips. Experimental results shown that G-SNPM has been able to remap without ambiguity almost all SNPs. Based on modern GPUs, G-SNPM provides fast mappings without worsening the accuracy of the results. G-SNPM can be used to deal with specialized Genome Wide Association Studies (GWAS), as well as in annotation tasks that require to update the SNP mapping probes.
Collapse
|
9
|
Mészáros G, Eaglen S, Waldmann P, Sölkner J. A Genome Wide Association Study for Longevity in Cattle. ACTA ACUST UNITED AC 2014. [DOI: 10.4236/ojgen.2014.41007] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
10
|
Waldmann P, Mészáros G, Gredler B, Fuerst C, Sölkner J. Evaluation of the lasso and the elastic net in genome-wide association studies. Front Genet 2013; 4:270. [PMID: 24363662 PMCID: PMC3850240 DOI: 10.3389/fgene.2013.00270] [Citation(s) in RCA: 132] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2013] [Accepted: 11/18/2013] [Indexed: 01/23/2023] Open
Abstract
The number of publications performing genome-wide association studies (GWAS) has increased dramatically. Penalized regression approaches have been developed to overcome the challenges caused by the high dimensional data, but these methods are relatively new in the GWAS field. In this study we have compared the statistical performance of two methods (the least absolute shrinkage and selection operator—lasso and the elastic net) on two simulated data sets and one real data set from a 50 K genome-wide single nucleotide polymorphism (SNP) panel of 5570 Fleckvieh bulls. The first simulated data set displays moderate to high linkage disequilibrium between SNPs, whereas the second simulated data set from the QTLMAS 2010 workshop is biologically more complex. We used cross-validation to find the optimal value of regularization parameter λ with both minimum MSE and minimum MSE + 1SE of minimum MSE. The optimal λ values were used for variable selection. Based on the first simulated data, we found that the minMSE in general picked up too many SNPs. At minMSE + 1SE, the lasso didn't acquire any false positives, but selected too few correct SNPs. The elastic net provided the best compromise between few false positives and many correct selections when the penalty weight α was around 0.1. However, in our simulation setting, this α value didn't result in the lowest minMSE + 1SE. The number of selected SNPs from the QTLMAS 2010 data was after correction for population structure 82 and 161 for the lasso and the elastic net, respectively. In the Fleckvieh data set after population structure correction lasso and the elastic net identified from 1291 to 1966 important SNPs for milk fat content, with major peaks on chromosomes 5, 14, 15, and 20. Hence, we can conclude that it is important to analyze GWAS data with both the lasso and the elastic net and an alternative tuning criterion to minimum MSE is needed for variable selection.
Collapse
Affiliation(s)
- Patrik Waldmann
- Division of Livestock Sciences, Department of Sustainable Agricultural Systems, University of Natural Resources and Life Sciences Vienna, Austria ; Division of Statistics, Department of Computer and Information Science, Linköping University Linköping, Sweden
| | - Gábor Mészáros
- Division of Livestock Sciences, Department of Sustainable Agricultural Systems, University of Natural Resources and Life Sciences Vienna, Austria
| | | | | | - Johann Sölkner
- Division of Livestock Sciences, Department of Sustainable Agricultural Systems, University of Natural Resources and Life Sciences Vienna, Austria
| |
Collapse
|
11
|
Imputation of high-density genotypes in the Fleckvieh cattle population. Genet Sel Evol 2013; 45:3. [PMID: 23406470 PMCID: PMC3598996 DOI: 10.1186/1297-9686-45-3] [Citation(s) in RCA: 77] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2012] [Accepted: 01/25/2013] [Indexed: 01/23/2023] Open
Abstract
Background Currently, genome-wide evaluation of cattle populations is based on SNP-genotyping using ~ 54 000 SNP. Increasing the number of markers might improve genomic predictions and power of genome-wide association studies. Imputation of genotypes makes it possible to extrapolate genotypes from lower to higher density arrays based on a representative reference sample for which genotypes are obtained at higher density. Methods Genotypes using 639 214 SNP were available for 797 bulls of the Fleckvieh cattle breed. The data set was divided into a reference and a validation population. Genotypes for all SNP except those included in the BovineSNP50 Bead chip were masked and subsequently imputed for animals of the validation population. Imputation of genotypes was performed with Beagle, findhap.f90, MaCH and Minimac. The accuracy of the imputed genotypes was assessed for four different scenarios including 50, 100, 200 and 400 animals as reference population. The reference animals were selected to account for 78.03%, 89.21%, 97.47% and > 99% of the gene pool of the genotyped population, respectively. Results Imputation accuracy increased as the number of animals and relatives in the reference population increased. Population-based algorithms provided highly reliable imputation of genotypes, even for scenarios with 50 and 100 reference animals only. Using MaCH and Minimac, the correlation between true and imputed genotypes was > 0.975 with 100 reference animals only. Pre-phasing the genotypes of both the reference and validation populations not only provided highly accurate imputed genotypes but was also computationally efficient. Genome-wide analysis of imputation accuracy led to the identification of many misplaced SNP. Conclusions Genotyping key animals at high density and subsequent population-based genotype imputation yield high imputation accuracy. Pre-phasing the genotypes of the reference and validation populations is computationally efficient and results in high imputation accuracy, even when the reference population is small.
Collapse
|