1
|
John M, Korte A, Grimm DG. The benefits of permutation-based genome-wide association studies. JOURNAL OF EXPERIMENTAL BOTANY 2024; 75:5377-5389. [PMID: 38954539 PMCID: PMC11389838 DOI: 10.1093/jxb/erae280] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Accepted: 07/01/2024] [Indexed: 07/04/2024]
Abstract
Linear mixed models (LMMs) are a commonly used method for genome-wide association studies (GWAS) that aim to detect associations between genetic markers and phenotypic measurements in a population of individuals while accounting for population structure and cryptic relatedness. In a standard GWAS, hundreds of thousands to millions of statistical tests are performed, requiring control for multiple hypothesis testing. Typically, static corrections that penalize the number of tests performed are used to control for the family-wise error rate, which is the probability of making at least one false positive. However, it has been shown that in practice this threshold is too conservative for normally distributed phenotypes and not stringent enough for non-normally distributed phenotypes. Therefore, permutation-based LMM approaches have recently been proposed to provide a more realistic threshold that takes phenotypic distributions into account. In this work, we discuss the advantages of permutation-based GWAS approaches, including new simulations and results from a re-analysis of all publicly available Arabidopsis phenotypes from the AraPheno database.
Collapse
Affiliation(s)
- Maura John
- Technical University of Munich, Campus Straubing for Biotechnology and Sustainability, Bioinformatics, Petersgasse 18, 94315 Straubing, Germany
- Weihenstephan-Triesdorf University of Applied Sciences, Bioinformatics, Petersgasse 18, 94315 Straubing, Germany
| | - Arthur Korte
- University of Würzburg, Faculty of Biology, Julius-von-Sachs Institute, Julius-von-Sachs-Platz 3, 97082 Würzburg, Germany
| | - Dominik G Grimm
- Technical University of Munich, Campus Straubing for Biotechnology and Sustainability, Bioinformatics, Petersgasse 18, 94315 Straubing, Germany
- Weihenstephan-Triesdorf University of Applied Sciences, Bioinformatics, Petersgasse 18, 94315 Straubing, Germany
- Technical University of Munich, TUM School of Computation, Information and Technology, Boltzmannstraße 3, 85748 Garching, Germany
| |
Collapse
|
2
|
Tajerian A. Longitudinal study investigating the influence of COMT gene polymorphism on cortical thickness changes in Parkinson's disease over four years. Sci Rep 2024; 14:9920. [PMID: 38689006 PMCID: PMC11061119 DOI: 10.1038/s41598-024-60828-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Accepted: 04/27/2024] [Indexed: 05/02/2024] Open
Abstract
Parkinson's disease (PD) is a progressive neurodegenerative disorder affecting over 3% of those over 65. It's caused by reduced dopaminergic neurons and Lewy bodies, leading to motor and non-motor symptoms. The relationship between COMT gene polymorphisms and PD is complex and not fully elucidated. Some studies have reported associations between certain COMT gene variants and PD risk, while others have not found significant associations. This study investigates how COMT gene variations impact cortical thickness changes in PD patients over time, aiming to link genetic factors, especially COMT gene variations, with PD progression. This study analyzed data from 44 PD patients with complete 4-year imaging follow-up from the Parkinson Progression Marker Initiative (PPMI) database. Magnetic resonance imaging (MRI) scans were acquired using consistent methods across 9 different MRI scanners. COMT single-nucleotide polymorphisms (SNPs) were assessed based on whole genome sequencing data. Longitudinal image analysis was conducted using FreeSurfer's processing pipeline. Linear mixed-effect models were employed to examine the interaction effect of genetic variations and time on cortical thickness, while controlling for covariates and subject-specific variations. The rs165599 SNP stands out as a potential contributor to alterations in cortical thickness, showing a significant reduction in overall mean cortical thickness in both hemispheres in homozygotes (Left: P = 0.023, Right: P = 0.028). The supramarginal, precentral, and superior frontal regions demonstrated significant bilateral alterations linked to rs165599. Our findings suggest that the rs165599 variant leads to earlier manifestation of cortical thinning during the course of the disease. However, it does not result in more severe cortical thinning outcomes over time. There is a need for larger cohorts and control groups to validate these findings and consider genetic variant interactions and clinical features to elucidate the specific mechanisms underlying COMT-related neurodegenerative processes in PD.
Collapse
Affiliation(s)
- Amin Tajerian
- School of Medicine, Arak University of Medical Sciences, Arak, Iran.
| |
Collapse
|
3
|
Ghosal S, Schatz MC, Venkataraman A. BEATRICE: Bayesian Fine-mapping from Summary Data using Deep Variational Inference. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.24.534116. [PMID: 36993396 PMCID: PMC10055416 DOI: 10.1101/2023.03.24.534116] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
We introduce a novel framework BEATRICE to identify putative causal variants from GWAS summary statistics (https://github.com/sayangsep/Beatrice-Finemapping). Identifying causal variants is challenging due to their sparsity and to highly correlated variants in the nearby regions. To account for these challenges, our approach relies on a hierarchical Bayesian model that imposes a binary concrete prior on the set of causal variants. We derive a variational algorithm for this fine-mapping problem by minimizing the KL divergence between an approximate density and the posterior probability distribution of the causal configurations. Correspondingly, we use a deep neural network as an inference machine to estimate the parameters of our proposal distribution. Our stochastic optimization procedure allows us to simultaneously sample from the space of causal configurations. We use these samples to compute the posterior inclusion probabilities and determine credible sets for each causal variant. We conduct a detailed simulation study to quantify the performance of our framework across different numbers of causal variants and different noise paradigms, as defined by the relative genetic contributions of causal and non-causal variants. Using this simulated data, we perform a comparative analysis against two state-of-the-art baseline methods for fine-mapping. We demonstrate that BEATRICE achieves uniformly better coverage with comparable power and set sizes, and that the performance gain increases with the number of causal variants. Thus, BEATRICE is a valuable tool to identify causal variants from eQTL and GWAS summary statistics across complex diseases and traits.
Collapse
Affiliation(s)
- Sayan Ghosal
- Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Michael C. Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Archana Venkataraman
- Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD, USA
- Department of Electrical and Computer Engineering, Boston University, Boston, MA, USA
| |
Collapse
|
4
|
Castro-Pearson S, Samorodnitsky S, Yang K, Lotfi-Emran S, Ingraham NE, Bramante C, Jones EK, Greising S, Yu M, Steffen BT, Svensson J, Åhlberg E, Österberg B, Wacker D, Guan W, Puskarich M, Smed-Sörensen A, Lusczek E, Safo SE, Tignanelli CJ. Development of a proteomic signature associated with severe disease for patients with COVID-19 using data from 5 multicenter, randomized, controlled, and prospective studies. Sci Rep 2023; 13:20315. [PMID: 37985892 PMCID: PMC10661735 DOI: 10.1038/s41598-023-46343-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Accepted: 10/31/2023] [Indexed: 11/22/2023] Open
Abstract
Significant progress has been made in preventing severe COVID-19 disease through the development of vaccines. However, we still lack a validated baseline predictive biologic signature for the development of more severe disease in both outpatients and inpatients infected with SARS-CoV-2. The objective of this study was to develop and externally validate, via 5 international outpatient and inpatient trials and/or prospective cohort studies, a novel baseline proteomic signature, which predicts the development of moderate or severe (vs mild) disease in patients with COVID-19 from a proteomic analysis of 7000 + proteins. The secondary objective was exploratory, to identify (1) individual baseline protein levels and/or (2) protein level changes within the first 2 weeks of acute infection that are associated with the development of moderate/severe (vs mild) disease. For model development, samples collected from 2 randomized controlled trials were used. Plasma was isolated and the SomaLogic SomaScan platform was used to characterize protein levels for 7301 proteins of interest for all studies. We dichotomized 113 patients as having mild or moderate/severe COVID-19 disease. An elastic net approach was used to develop a predictive proteomic signature. For validation, we applied our signature to data from three independent prospective biomarker studies. We found 4110 proteins measured at baseline that significantly differed between patients with mild COVID-19 and those with moderate/severe COVID-19 after adjusting for multiple hypothesis testing. Baseline protein expression was associated with predicted disease severity with an error rate of 4.7% (AUC = 0.964). We also found that five proteins (Afamin, I-309, NKG2A, PRS57, LIPK) and patient age serve as a signature that separates patients with mild COVID-19 and patients with moderate/severe COVID-19 with an error rate of 1.77% (AUC = 0.9804). This panel was validated using data from 3 external studies with AUCs of 0.764 (Harvard University), 0.696 (University of Colorado), and 0.893 (Karolinska Institutet). In this study we developed and externally validated a baseline COVID-19 proteomic signature associated with disease severity for potential use in both outpatients and inpatients with COVID-19.
Collapse
Affiliation(s)
- Sandra Castro-Pearson
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN, USA
| | - Sarah Samorodnitsky
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN, USA
| | - Kaifeng Yang
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN, USA
| | - Sahar Lotfi-Emran
- Department of Medicine, University of Minnesota, Minneapolis, MN, USA
| | | | - Carolyn Bramante
- Department of Medicine, University of Minnesota, Minneapolis, MN, USA
| | - Emma K Jones
- Department of Surgery, University of Minnesota, 420 Delaware St SE, Minneapolis, MN, 55455, USA
| | - Sarah Greising
- School of Kinesiology, University of Minnesota, Minneapolis, MN, USA
| | - Meng Yu
- Division of Immunology and Allergy, Department of Medicine Solna, Center for Molecular Medicine, Karolinska Institutet and Karolinska University Hospital, Stockholm, Sweden
| | - Brian T Steffen
- Department of Surgery, University of Minnesota, 420 Delaware St SE, Minneapolis, MN, 55455, USA
| | - Julia Svensson
- Division of Immunology and Allergy, Department of Medicine Solna, Center for Molecular Medicine, Karolinska Institutet and Karolinska University Hospital, Stockholm, Sweden
| | - Eric Åhlberg
- Division of Immunology and Allergy, Department of Medicine Solna, Center for Molecular Medicine, Karolinska Institutet and Karolinska University Hospital, Stockholm, Sweden
| | - Björn Österberg
- Division of Immunology and Allergy, Department of Medicine Solna, Center for Molecular Medicine, Karolinska Institutet and Karolinska University Hospital, Stockholm, Sweden
| | - David Wacker
- Department of Medicine, University of Minnesota, Minneapolis, MN, USA
| | - Weihua Guan
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN, USA
| | - Michael Puskarich
- Department of Emergency Medicine, University of Minnesota, Minneapolis, MN, USA
- Department of Emergency Medicine, Hennepin County Medical Center, Minneapolis, MN, USA
| | - Anna Smed-Sörensen
- Division of Immunology and Allergy, Department of Medicine Solna, Center for Molecular Medicine, Karolinska Institutet and Karolinska University Hospital, Stockholm, Sweden
| | - Elizabeth Lusczek
- Department of Surgery, University of Minnesota, 420 Delaware St SE, Minneapolis, MN, 55455, USA
| | - Sandra E Safo
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN, USA
| | - Christopher J Tignanelli
- Department of Surgery, University of Minnesota, 420 Delaware St SE, Minneapolis, MN, 55455, USA.
- Institute for Health Informatics, University of Minnesota, Minneapolis, MN, USA.
| |
Collapse
|
5
|
John M, Lencz T. Potential application of elastic nets for shared polygenicity detection with adapted threshold selection. Int J Biostat 2023; 19:417-438. [PMID: 36327464 PMCID: PMC10154439 DOI: 10.1515/ijb-2020-0108] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2020] [Accepted: 10/05/2022] [Indexed: 11/06/2022]
Abstract
Current research suggests that hundreds to thousands of single nucleotide polymorphisms (SNPs) with small to modest effect sizes contribute to the genetic basis of many disorders, a phenomenon labeled as polygenicity. Additionally, many such disorders demonstrate polygenic overlap, in which risk alleles are shared at associated genetic loci. A simple strategy to detect polygenic overlap between two phenotypes is based on rank-ordering the univariate p-values from two genome-wide association studies (GWASs). Although high-dimensional variable selection strategies such as Lasso and elastic nets have been utilized in other GWAS analysis settings, they are yet to be utilized for detecting shared polygenicity. In this paper, we illustrate how elastic nets, with polygenic scores as the dependent variable and with appropriate adaptation in selecting the penalty parameter, may be utilized for detecting a subset of SNPs involved in shared polygenicity. We provide theory to better understand our approaches, and illustrate their utility using synthetic datasets. Results from extensive simulations are presented comparing the elastic net approaches with the rank ordering approach, in various scenarios. Results from simulations studies exhibit one of the elastic net approaches to be superior when the correlations among the SNPs are high. Finally, we apply the methods on two real datasets to illustrate further the capabilities, limitations and differences among the methods.
Collapse
Affiliation(s)
- Majnu John
- Institute of Behavioral Science, Feinstein Institutes of Medical Research, Manhasset, NY
- Division of Psychiatry Research, The Zucker Hillside Hospital, Northwell Health System, Glen Oaks, NY
- Departments of Psychiatry and of Mathematics, Hofstra University, Hempstead, NY
| | - Todd Lencz
- Institute of Behavioral Science, Feinstein Institutes of Medical Research, Manhasset, NY
- Division of Psychiatry Research, The Zucker Hillside Hospital, Northwell Health System, Glen Oaks, NY
- Departments of Psychiatry and of Molecular Medicine, Zucker School of Medicine at Hofstra/Northwell, Hempstead, NY
| |
Collapse
|
6
|
Yang K, Kang Z, Guan W, Lotfi-Emran S, Mayer ZJ, Guerrero CR, Steffen BT, Puskarich MA, Tignanelli CJ, Lusczek E, Safo SE. Developing A Baseline Metabolomic Signature Associated with COVID-19 Severity: Insights from Prospective Trials Encompassing 13 U.S. Centers. Metabolites 2023; 13:1107. [PMID: 37999202 PMCID: PMC10672920 DOI: 10.3390/metabo13111107] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Revised: 10/14/2023] [Accepted: 10/16/2023] [Indexed: 11/25/2023] Open
Abstract
Metabolic disease is a significant risk factor for severe COVID-19 infection, but the contributing pathways are not yet fully elucidated. Using data from two randomized controlled trials across 13 U.S. academic centers, our goal was to characterize metabolic features that predict severe COVID-19 and define a novel baseline metabolomic signature. Individuals (n = 133) were dichotomized as having mild or moderate/severe COVID-19 disease based on the WHO ordinal scale. Blood samples were analyzed using the Biocrates platform, providing 630 targeted metabolites for analysis. Resampling techniques and machine learning models were used to determine metabolomic features associated with severe disease. Ingenuity Pathway Analysis (IPA) was used for functional enrichment analysis. To aid in clinical decision making, we created baseline metabolomics signatures of low-correlated molecules. Multivariable logistic regression models were fit to associate these signatures with severe disease on training data. A three-metabolite signature, lysophosphatidylcholine a C17:0, dihydroceramide (d18:0/24:1), and triacylglyceride (20:4_36:4), resulted in the best discrimination performance with an average test AUROC of 0.978 and F1 score of 0.942. Pathways related to amino acids were significantly enriched from the IPA analyses, and the mitogen-activated protein kinase kinase 5 (MAP2K5) was differentially activated between groups. In conclusion, metabolites related to lipid metabolism efficiently discriminated between mild vs. moderate/severe disease. SDMA and GABA demonstrated the potential to discriminate between these two groups as well. The mitogen-activated protein kinase kinase 5 (MAP2K5) regulator is differentially activated between groups, suggesting further investigation as a potential therapeutic pathway.
Collapse
Affiliation(s)
- Kaifeng Yang
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455, USA (S.E.S.)
| | - Zhiyu Kang
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455, USA (S.E.S.)
| | - Weihua Guan
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455, USA (S.E.S.)
| | - Sahar Lotfi-Emran
- Department of Medicine, University of Minnesota, Minneapolis, MN 55455, USA
| | - Zachary J. Mayer
- Center for Metabolomics and Proteomics, University of Minnesota, Minneapolis, MN 55455, USA
| | - Candace R. Guerrero
- Center for Metabolomics and Proteomics, University of Minnesota, Minneapolis, MN 55455, USA
| | - Brian T. Steffen
- Department of Surgery, University of Minnesota, Minneapolis, MN 55455, USA (E.L.)
| | - Michael A. Puskarich
- Department of Emergency Medicine, University of Minnesota, Minneapolis, MN 55455, USA
- Department of Emergency Medicine, Hennepin County Medical Center, Minneapolis, MN 55455, USA
| | - Christopher J. Tignanelli
- Department of Surgery, University of Minnesota, Minneapolis, MN 55455, USA (E.L.)
- Institute for Health Informatics, University of Minnesota, Minneapolis, MN 55455, USA
| | - Elizabeth Lusczek
- Department of Surgery, University of Minnesota, Minneapolis, MN 55455, USA (E.L.)
| | - Sandra E. Safo
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455, USA (S.E.S.)
| |
Collapse
|
7
|
Urbut SM, Koyama S, Hornsby W, Bhukar R, Kheterpal S, Truong B, Selvaraj MS, Neale B, O’Donnell CJ, Peloso GM, Natarajan P. Bayesian multivariate genetic analysis improves translational insights. iScience 2023; 26:107854. [PMID: 37766997 PMCID: PMC10520309 DOI: 10.1016/j.isci.2023.107854] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2023] [Revised: 05/15/2023] [Accepted: 09/05/2023] [Indexed: 09/29/2023] Open
Abstract
While lipid traits are known essential mediators of cardiovascular disease, few approaches have taken advantage of their shared genetic effects. We apply a Bayesian multivariate size estimator, mash, to GWAS of four lipid traits in the Million Veterans Program (MVP) and provide posterior mean and local false sign rates for all effects. These estimates borrow information across traits to improve effect size accuracy. We show that controlling local false sign rates accurately and powerfully identifies replicable genetic associations and that multivariate control furthers the ability to explain complex diseases. Our application yields high concordance between independent datasets, more accurately prioritizes causal genes, and significantly improves polygenic prediction beyond state-of-the-art methods by up to 59% for lipid traits. The use of Bayesian multivariate genetic shrinkage has yet to be applied to human quantitative trait GWAS results, and we present a staged approach to prediction on a polygenic scale.
Collapse
Affiliation(s)
- Sarah M. Urbut
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA 02114, USA
- Program in Medical and Population Genetics, Broad Institute, Cambridge, MA 02142, USA
| | - Satoshi Koyama
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA 02114, USA
- Program in Medical and Population Genetics, Broad Institute, Cambridge, MA 02142, USA
- Department of Medicine Harvard Medical School, Boston, MA 02115, USA
| | - Whitney Hornsby
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA 02114, USA
- Program in Medical and Population Genetics, Broad Institute, Cambridge, MA 02142, USA
- Department of Medicine Harvard Medical School, Boston, MA 02115, USA
| | - Rohan Bhukar
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA 02114, USA
- Program in Medical and Population Genetics, Broad Institute, Cambridge, MA 02142, USA
- Department of Medicine Harvard Medical School, Boston, MA 02115, USA
| | - Sumeet Kheterpal
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA 02114, USA
- Program in Medical and Population Genetics, Broad Institute, Cambridge, MA 02142, USA
| | - Buu Truong
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA 02114, USA
- Program in Medical and Population Genetics, Broad Institute, Cambridge, MA 02142, USA
- Department of Medicine Harvard Medical School, Boston, MA 02115, USA
| | - Margaret S. Selvaraj
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA 02114, USA
- Program in Medical and Population Genetics, Broad Institute, Cambridge, MA 02142, USA
- Department of Medicine Harvard Medical School, Boston, MA 02115, USA
| | - Benjamin Neale
- Program in Medical and Population Genetics, Broad Institute, Cambridge, MA 02142, USA
- Department of Medicine Harvard Medical School, Boston, MA 02115, USA
- Analytic Translational and Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Christopher J. O’Donnell
- Department of Medicine Harvard Medical School, Boston, MA 02115, USA
- VA Boston Department of Veterans Affairs, Boston, MA 02130, USA
| | - Gina M. Peloso
- Department of Biostatistics, Boston University School of Public Health, Boston, MA 02218, USA
| | - Pradeep Natarajan
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA 02114, USA
- Program in Medical and Population Genetics, Broad Institute, Cambridge, MA 02142, USA
- Department of Medicine Harvard Medical School, Boston, MA 02115, USA
| |
Collapse
|
8
|
Rani R, Raza G, Ashfaq H, Rizwan M, Razzaq MK, Waheed MQ, Shimelis H, Babar AD, Arif M. Genome-wide association study of soybean ( Glycine max [L.] Merr.) germplasm for dissecting the quantitative trait nucleotides and candidate genes underlying yield-related traits. FRONTIERS IN PLANT SCIENCE 2023; 14:1229495. [PMID: 37636105 PMCID: PMC10450938 DOI: 10.3389/fpls.2023.1229495] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Accepted: 07/25/2023] [Indexed: 08/29/2023]
Abstract
Soybean (Glycine max [L.] Merr.) is one of the most significant crops in the world in terms of oil and protein. Owing to the rising demand for soybean products, there is an increasing need for improved varieties for more productive farming. However, complex correlation patterns among quantitative traits along with genetic interactions pose a challenge for soybean breeding. Association studies play an important role in the identification of accession with useful alleles by locating genomic sites associated with the phenotype in germplasm collections. In the present study, a genome-wide association study was carried out for seven agronomic and yield-related traits. A field experiment was conducted in 2015/2016 at two locations that include 155 diverse soybean germplasm. These germplasms were genotyped using SoySNP50K Illumina Infinium Bead-Chip. A total of 51 markers were identified for node number, plant height, pods per plant, seeds per plant, seed weight per plant, hundred-grain weight, and total yield using a multi-locus linear mixed model (MLMM) in FarmCPU. Among these significant SNPs, 18 were putative novel QTNs, while 33 co-localized with previously reported QTLs. A total of 2,356 genes were found in 250 kb upstream and downstream of significant SNPs, of which 17 genes were functional and the rest were hypothetical proteins. These 17 candidate genes were located in the region of 14 QTNs, of which ss715580365, ss715608427, ss715632502, and ss715620131 are novel QTNs for PH, PPP, SDPP, and TY respectively. Four candidate genes, Glyma.01g199200, Glyma.10g065700, Glyma.18g297900, and Glyma.14g009900, were identified in the vicinity of these novel QTNs, which encode lsd one like 1, Ergosterol biosynthesis ERG4/ERG24 family, HEAT repeat-containing protein, and RbcX2, respectively. Although further experimental validation of these candidate genes is required, several appear to be involved in growth and developmental processes related to the respective agronomic traits when compared with their homologs in Arabidopsis thaliana. This study supports the usefulness of association studies and provides valuable data for functional markers and investigating candidate genes within a diverse germplasm collection in future breeding programs.
Collapse
Affiliation(s)
- Reena Rani
- Agricultural Biotechnology Division, National Institute for Biotechnology and Genetic Engineering (NIBGE), Constituent College Pakistan Institute of Engineering and Applied Sciences (PIEAS), Faisalabad, Pakistan
| | - Ghulam Raza
- Agricultural Biotechnology Division, National Institute for Biotechnology and Genetic Engineering (NIBGE), Constituent College Pakistan Institute of Engineering and Applied Sciences (PIEAS), Faisalabad, Pakistan
| | - Hamza Ashfaq
- Agricultural Biotechnology Division, National Institute for Biotechnology and Genetic Engineering (NIBGE), Constituent College Pakistan Institute of Engineering and Applied Sciences (PIEAS), Faisalabad, Pakistan
| | - Muhammad Rizwan
- Plant Breeding and Genetics Division, Nuclear Institute of Agriculture (NIA), Tando Jam, Pakistan
| | - Muhammad Khuram Razzaq
- Soybean Research Institute, National Center for Soybean Improvement, Nanjing Agricultural University, Nanjing, China
| | - Muhammad Qandeel Waheed
- Plant Breeding and Genetics Division, Nuclear Institute for Agriculture and Biology (NIAB), Constituent College Pakistan Institute of Engineering and Applied Sciences (PIEAS), Faisalabad, Pakistan
| | - Hussein Shimelis
- School of Agricultural, Earth and Environmental Sciences, African Centre for Crop Improvement, University of KwaZulu-Natal, Pietermaritzburg, South Africa
| | - Allah Ditta Babar
- Agricultural Biotechnology Division, National Institute for Biotechnology and Genetic Engineering (NIBGE), Constituent College Pakistan Institute of Engineering and Applied Sciences (PIEAS), Faisalabad, Pakistan
| | - Muhammad Arif
- Agricultural Biotechnology Division, National Institute for Biotechnology and Genetic Engineering (NIBGE), Constituent College Pakistan Institute of Engineering and Applied Sciences (PIEAS), Faisalabad, Pakistan
| |
Collapse
|
9
|
Wainberg M, Andrews SJ, Tripathy SJ. Shared genetic risk loci between Alzheimer's disease and related dementias, Parkinson's disease, and amyotrophic lateral sclerosis. Alzheimers Res Ther 2023; 15:113. [PMID: 37328865 PMCID: PMC10273745 DOI: 10.1186/s13195-023-01244-3] [Citation(s) in RCA: 19] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2023] [Accepted: 05/16/2023] [Indexed: 06/18/2023]
Abstract
BACKGROUND Genome-wide association studies (GWAS) have indicated moderate genetic overlap between Alzheimer's disease (AD) and related dementias (ADRD), Parkinson's disease (PD) and amyotrophic lateral sclerosis (ALS), neurodegenerative disorders traditionally considered etiologically distinct. However, the specific genetic variants and loci underlying this overlap remain almost entirely unknown. METHODS We leveraged state-of-the-art GWAS for ADRD, PD, and ALS. For each pair of disorders, we examined each of the GWAS hits for one disorder and tested whether they were also significant for the other disorder, applying Bonferroni correction for the number of variants tested. This approach rigorously controls the family-wise error rate for both disorders, analogously to genome-wide significance. RESULTS Eleven loci with GWAS hits for one disorder were also associated with one or both of the other disorders: one with all three disorders (the MAPT/KANSL1 locus), five with ADRD and PD (near LCORL, CLU, SETD1A/KAT8, WWOX, and GRN), three with ADRD and ALS (near GPX3, HS3ST5/HDAC2/MARCKS, and TSPOAP1), and two with PD and ALS (near GAK/TMEM175 and NEK1). Two of these loci (LCORL and NEK1) were associated with an increased risk of one disorder but decreased risk of another. Colocalization analysis supported a shared causal variant between ADRD and PD at the CLU, WWOX, and LCORL loci, between ADRD and ALS at the TSPOAP1 locus, and between PD and ALS at the NEK1 and GAK/TMEM175 loci. To address the concern that ADRD is an imperfect proxy for AD and that the ADRD and PD GWAS have overlapping participants (nearly all of which are from the UK Biobank), we confirmed that all our ADRD associations had nearly identical odds ratios in an AD GWAS that excluded the UK Biobank, and all but one remained nominally significant (p < 0.05) for AD. CONCLUSIONS In one of the most comprehensive investigations to date of pleiotropy between neurodegenerative disorders, we identify eleven genetic risk loci shared among ADRD, PD, and ALS. These loci support lysosomal/autophagic dysfunction (GAK/TMEM175, GRN, KANSL1), neuroinflammation/immunity (TSPOAP1), oxidative stress (GPX3, KANSL1), and the DNA damage response (NEK1) as transdiagnostic processes underlying multiple neurodegenerative disorders.
Collapse
Affiliation(s)
- Michael Wainberg
- Centre for Addiction and Mental Health, 250 College Street, Toronto, M5T 1R8, Canada
| | - Shea J Andrews
- Department of Psychiatry & Behavioral Sciences, University of California San Francisco, San Francisco, 94143, USA
| | - Shreejoy J Tripathy
- Centre for Addiction and Mental Health, 250 College Street, Toronto, M5T 1R8, Canada.
- Institute of Medical Sciences, University of Toronto, Toronto, M5S 1A8, Canada.
- Department of Psychiatry, University of Toronto, Toronto, M5T 1R8, Canada.
- Department of Physiology, University of Toronto, Toronto, M5S 1A8, Canada.
| |
Collapse
|
10
|
Obry L, Dalmasso C. Weighted multiple testing procedures in genome-wide association studies. PeerJ 2023; 11:e15369. [PMID: 37337586 PMCID: PMC10276986 DOI: 10.7717/peerj.15369] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Accepted: 04/17/2023] [Indexed: 06/21/2023] Open
Abstract
Multiple testing procedures controlling the false discovery rate (FDR) are increasingly used in the context of genome wide association studies (GWAS), and weighted multiple testing procedures that incorporate covariate information are efficient to improve the power to detect associations. In this work, we evaluate some recent weighted multiple testing procedures in the specific context of GWAS through a simulation study. We also present a new efficient procedure called wBHa that prioritizes the detection of genetic variants with low minor allele frequencies while maximizing the overall detection power. The results indicate good performance of our procedure compared to other weighted multiple testing procedures. In particular, in all simulated settings, wBHa tends to outperform other procedures in detecting rare variants while maintaining good overall power. The use of the different procedures is illustrated with a real dataset.
Collapse
Affiliation(s)
- Ludivine Obry
- Université Paris-Saclay, CNRS, Univ Evry, Laboratoire de Mathématiques et Modélisation d’Evry, Evry-Courcouronnes, France
| | - Cyril Dalmasso
- Université Paris-Saclay, CNRS, Univ Evry, Laboratoire de Mathématiques et Modélisation d’Evry, Evry-Courcouronnes, France
| |
Collapse
|
11
|
Lyman GH, Msaouel P, Kuderer NM. Risk Model Development and Validation in Clinical Oncology: Lessons Learned. Cancer Invest 2023; 41:1-11. [PMID: 36254812 DOI: 10.1080/07357907.2022.2137914] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
Reliable risk models can greatly facilitate patient-centered inferences and decisions. Herein we summarize key considerations related to risk modeling in clinical oncology. Often overlooked challenges include data quality, missing data, effective sample size estimation, and selecting the variables to be included in the risk model. The stability and quality of the model should be carefully interrogated with particular emphasis on rigorous internal validation.
Collapse
Affiliation(s)
- Gary H Lyman
- Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Pavlos Msaouel
- Department of Genitourinary Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | | |
Collapse
|
12
|
Bogomolov M. Testing partial conjunction hypotheses under dependency, with applications to meta-analysis. Electron J Stat 2023. [DOI: 10.1214/22-ejs2100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Affiliation(s)
- Marina Bogomolov
- Faculty of Data and Decision Sciences, Technion - Israel Institute of Technology, Haifa 3200003, Israel
| |
Collapse
|
13
|
Prioritized candidate causal haplotype blocks in plant genome-wide association studies. PLoS Genet 2022; 18:e1010437. [PMID: 36251695 PMCID: PMC9612827 DOI: 10.1371/journal.pgen.1010437] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2021] [Revised: 10/27/2022] [Accepted: 09/20/2022] [Indexed: 11/05/2022] Open
Abstract
Genome wide association studies (GWAS) can play an essential role in understanding genetic basis of complex traits in plants and animals. Conventional SNP-based linear mixed models (LMM) that marginally test single nucleotide polymorphisms (SNPs) have successfully identified many loci with major and minor effects in many GWAS. In plant, the relatively small population size in GWAS and the high genetic diversity found in many plant species can impede mapping efforts on complex traits. Here we present a novel haplotype-based trait fine-mapping framework, HapFM, to supplement current GWAS methods. HapFM uses genotype data to partition the genome into haplotype blocks, identifies haplotype clusters within each block, and then performs genome-wide haplotype fine-mapping to prioritize the candidate causal haplotype blocks of trait. We benchmarked HapFM, GEMMA, BSLMM, GMMAT, and BLINK in both simulated and real plant GWAS datasets. HapFM consistently resulted in higher mapping power than the other GWAS methods in high polygenicity simulation setting. Moreover, it resulted in smaller mapping intervals, especially in regions of high LD, achieved by prioritizing small candidate causal blocks in the larger haplotype blocks. In the Arabidopsis flowering time (FT10) datasets, HapFM identified four novel loci compared to GEMMA’s results, and the average mapping interval of HapFM was 9.6 times smaller than that of GEMMA. In conclusion, HapFM is tailored for plant GWAS to result in high mapping power on complex traits and improved on mapping resolution to facilitate crop improvement. Genome-wide association studies (GWAS) are commonly used in human and plant studies to identify genetic variants responsible for the phenotype of interest and provide foundations for studying disease mechanisms and crop improvement. Most GWAS models are developed and optimized using human datasets. However, the difference between human and plant datasets essentially limits their applications in plant studies, especially when mapping complex traits such as drought resistance and yield. In this study, we present a novel GWAS method, HapFM, tailored for plant datasets to overcome the difficulties of many conventional GWAS methods. HapFM resulted in higher statistical power than conventional GWAS methods for mapping complex traits in our simulation and real dataset analyses. In addition, HapFM reduced the mapping interval by prioritizing candidate causal regions in the genome, which benefits the downstream experimental studies. Last but not least, HapFM can incorporate biological annotations to increase statistical power further. Overall, HapFM balances statistical power, result interpretability, and downstream experimental verifiability.
Collapse
|
14
|
Aboul-Naga AM, Alsamman AM, El Allali A, Elshafie MH, Abdelal ES, Abdelkhalek TM, Abdelsabour TH, Mohamed LG, Hamwieh A. Genome-wide analysis identified candidate variants and genes associated with heat stress adaptation in Egyptian sheep breeds. Front Genet 2022; 13:898522. [PMID: 36263427 PMCID: PMC9574253 DOI: 10.3389/fgene.2022.898522] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2022] [Accepted: 09/05/2022] [Indexed: 11/24/2022] Open
Abstract
Heat stress caused by climatic changes is one of the most significant stresses on livestock in hot and dry areas. It has particularly adverse effects on the ability of the breed to maintain homeothermy. Developing countries are advised to protect and prepare their animal resources in the face of potential threats such as climate change. The current study was conducted in Egypt's three hot and dry agro-ecological zones. Three local sheep breeds (Saidi, Wahati, and Barki) were studied with a total of 206 ewes. The animals were exercised under natural heat stress. The heat tolerance index of the animals was calculated to identify animals with high and low heat tolerance based on their response to meteorological and physiological parameters. Genomic variation in these breeds was assessed using 64,756 single nucleotide polymorphic markers (SNPs). From the perspective of comparative adaptability to harsh conditions, our objective was to investigate the genomic structure that might control the adaptability of local sheep breeds to environmental stress under hot and dry conditions. In addition, indices of population structure and diversity of local breeds were examined. Measures of genetic diversity showed a significant influence of breed and location on populations. The standardized index of association (rbarD) ranged from 0.0012 (Dakhla) to 0.026 (Assuit), while for the breed, they ranged from 0.004 (Wahati) to 0.0103 (Saidi). The index of association analysis (Ia) ranged from 1.42 (Dakhla) to 35.88 (Assuit) by location and from 6.58 (Wahati) to 15.36 (Saidi) by breed. The most significant SNPs associated with heat tolerance were found in the MYO5A, PRKG1, GSTCD, and RTN1 genes (p ≤ 0.0001). MYO5A produces a protein widely distributed in the melanin-producing neural crest of the skin. Genetic association between genetic and phenotypic variations showed that OAR1_18300122.1, located in ST3GAL3, had the greatest positive effect on heat tolerance. Genome-wide association analysis identified SNPs associated with heat tolerance in the PLCB1, STEAP3, KSR2, UNC13C, PEBP4, and GPAT2 genes.
Collapse
Affiliation(s)
- Adel M. Aboul-Naga
- Animal Production Research Institute, Agriculture Research Center (ARC), Cairo, Egypt
| | | | - Achraf El Allali
- African Genome Center, Mohammed VI Polytechnic University, Ben Guerir, Morocco
| | - Mohmed H. Elshafie
- Animal Production Research Institute, Agriculture Research Center (ARC), Cairo, Egypt
| | - Ehab S. Abdelal
- Animal Production Research Institute, Agriculture Research Center (ARC), Cairo, Egypt
| | - Tarek M. Abdelkhalek
- Animal Production Research Institute, Agriculture Research Center (ARC), Cairo, Egypt
| | - Taha H. Abdelsabour
- Animal Production Research Institute, Agriculture Research Center (ARC), Cairo, Egypt
| | - Layaly G. Mohamed
- Animal Production Research Institute, Agriculture Research Center (ARC), Cairo, Egypt
| | - Aladdin Hamwieh
- International Center For Agricultural Research in the Dry Areas (ICARDA), Giza, Egypt
| |
Collapse
|
15
|
Monti GS, Filzmoser P. A robust knockoff filter for sparse regression analysis of microbiome compositional data. Comput Stat 2022. [DOI: 10.1007/s00180-022-01268-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
Abstract
AbstractMicrobiome data analysis often relies on the identification of a subset of potential biomarkers associated with a clinical outcome of interest. Robust ZeroSum regression, an elastic-net penalized compositional regression built on the least trimmed squares estimator, is a variable selection procedure capable to cope with the high dimensionality of these data, their compositional nature, and, at the same time, it guarantees robustness against the presence of outliers. The necessity of discovering “true” effects and to improve clinical research quality and reproducibility has motivated us to propose a two-step robust compositional knockoff filter procedure, which allows selecting the set of relevant biomarkers, among the many measured features having a nonzero effect on the response, controlling the expected fraction of false positives. We demonstrate the effectiveness of our proposal in an extensive simulation study, and illustrate its usefulness in an application to intestinal microbiome analysis.
Collapse
|
16
|
Detecting signatures of selection on gene expression. Nat Ecol Evol 2022; 6:1035-1045. [PMID: 35551249 DOI: 10.1038/s41559-022-01761-8] [Citation(s) in RCA: 30] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2021] [Accepted: 04/01/2022] [Indexed: 12/15/2022]
Abstract
A substantial amount of phenotypic diversity results from changes in gene expression levels and patterns. Understanding how the transcriptome evolves is therefore a key priority in identifying mechanisms of adaptive change. However, in contrast to powerful models of sequence evolution, we lack a consensus model of gene expression evolution. Furthermore, recent work has shown that many of the comparative approaches used to study gene expression are subject to biases that can lead to false signatures of selection. Here we first outline the main approaches for describing expression evolution and their inherent biases. Next, we bridge the gap between the fields of phylogenetic comparative methods and transcriptomics to reinforce the main pitfalls of inferring selection on expression patterns and use simulation studies to show that shifts in tissue composition can heavily bias inferences of selection. We close by highlighting the multi-dimensional nature of transcriptional variation and identifying major unanswered questions in disentangling how selection acts on the transcriptome.
Collapse
|
17
|
Pudjihartono N, Fadason T, Kempa-Liehr AW, O'Sullivan JM. A Review of Feature Selection Methods for Machine Learning-Based Disease Risk Prediction. FRONTIERS IN BIOINFORMATICS 2022; 2:927312. [PMID: 36304293 PMCID: PMC9580915 DOI: 10.3389/fbinf.2022.927312] [Citation(s) in RCA: 75] [Impact Index Per Article: 37.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2022] [Accepted: 06/03/2022] [Indexed: 01/14/2023] Open
Abstract
Machine learning has shown utility in detecting patterns within large, unstructured, and complex datasets. One of the promising applications of machine learning is in precision medicine, where disease risk is predicted using patient genetic data. However, creating an accurate prediction model based on genotype data remains challenging due to the so-called “curse of dimensionality” (i.e., extensively larger number of features compared to the number of samples). Therefore, the generalizability of machine learning models benefits from feature selection, which aims to extract only the most “informative” features and remove noisy “non-informative,” irrelevant and redundant features. In this article, we provide a general overview of the different feature selection methods, their advantages, disadvantages, and use cases, focusing on the detection of relevant features (i.e., SNPs) for disease risk prediction.
Collapse
Affiliation(s)
| | - Tayaza Fadason
- Liggins Institute, University of Auckland, Auckland, New Zealand
- Maurice Wilkins Centre for Molecular Biodiscovery, Auckland, New Zealand
| | - Andreas W. Kempa-Liehr
- Department of Engineering Science, The University of Auckland, Auckland, New Zealand
- *Correspondence: Andreas W. Kempa-Liehr, ; Justin M. O'Sullivan,
| | - Justin M. O'Sullivan
- Liggins Institute, University of Auckland, Auckland, New Zealand
- Maurice Wilkins Centre for Molecular Biodiscovery, Auckland, New Zealand
- MRC Lifecourse Epidemiology Unit, University of Southampton, Southampton, United Kingdom
- Singapore Institute for Clinical Sciences, Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
- Australian Parkinson’s Mission, Garvan Institute of Medical Research, Sydney, NSW, Australia
- *Correspondence: Andreas W. Kempa-Liehr, ; Justin M. O'Sullivan,
| |
Collapse
|
18
|
Frommlet F, Szulc P, König F, Bogdan M. Selecting predictive biomarkers from genomic data. PLoS One 2022; 17:e0269369. [PMID: 35709188 PMCID: PMC9202896 DOI: 10.1371/journal.pone.0269369] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2021] [Accepted: 05/13/2022] [Indexed: 11/18/2022] Open
Abstract
Recently there have been tremendous efforts to develop statistical procedures which allow to determine subgroups of patients for which certain treatments are effective. This article focuses on the selection of prognostic and predictive genetic biomarkers based on a relatively large number of candidate Single Nucleotide Polymorphisms (SNPs). We consider models which include prognostic markers as main effects and predictive markers as interaction effects with treatment. We compare different high-dimensional selection approaches including adaptive lasso, a Bayesian adaptive version of the Sorted L-One Penalized Estimator (SLOBE) and a modified version of the Bayesian Information Criterion (mBIC2). These are compared with classical multiple testing procedures for individual markers. Having identified predictive markers we consider several different approaches how to specify subgroups susceptible to treatment. Our main conclusion is that selection based on mBIC2 and SLOBE has similar predictive performance as the adaptive lasso while including substantially fewer biomarkers.
Collapse
Affiliation(s)
- Florian Frommlet
- Department of Medical Statistics, CEMSIIS, Medical University of Vienna, Vienna, Austria
- * E-mail:
| | - Piotr Szulc
- Institute of Mathematics, University of Wroclaw, Wroclaw, Poland
| | - Franz König
- Department of Medical Statistics, CEMSIIS, Medical University of Vienna, Vienna, Austria
| | - Malgorzata Bogdan
- Institute of Mathematics, University of Wroclaw, Wroclaw, Poland
- Department of Statistics, Lund University, Lund, Sweden
| |
Collapse
|
19
|
Sutherland J, Bell T, Trexler RV, Carlson JE, Lasky JR. Host genomic influence on bacterial composition in the switchgrass rhizosphere. Mol Ecol 2022; 31:3934-3950. [PMID: 35621390 PMCID: PMC10150372 DOI: 10.1111/mec.16549] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2021] [Revised: 05/20/2022] [Accepted: 05/24/2022] [Indexed: 11/28/2022]
Abstract
Host genetic variation can shape the diversity and composition of associated microbiomes, which may reciprocally influence host traits and performance. While the genetic basis of phenotypic diversity of plant populations in nature has been studied, comparatively little research has investigated the genetics of host effects on their associated microbiomes. Switchgrass (Panicum virgatum) is a highly outcrossing, perennial, grass species with substantial locally adaptive diversity across its native North American range. Here, we compared 383 switchgrass accessions in a common garden to determine the host genotypic influence on rhizosphere bacterial composition. We hypothesized that the composition and diversity of rhizosphere bacterial assemblages would differentiate due to genotypic differences between hosts (potentially due to root phenotypes and associated life history variation). We observed higher alpha diversity of bacteria associated with upland ecotypes and tetraploids, compared to lowland ecotypes and octoploids, respectively. Alpha diversity correlated negatively with flowering time and plant height, indicating that bacterial composition varies along switchgrass life history axes. Narrow-sense heritability (h2 ) of the relative abundance of twenty-one core bacterial families was observed. Overall compositional differences among tetraploids, due to genetic variation, supports wide-spread genotypic influence on the rhizosphere microbiome. Tetraploids were only considered due to complexities associated with the octoploid genomes. Lastly, a genome-wide association study identified 1,861 single-nucleotide polymorphisms associated with 110 families and genes containing them related to potential regulatory functions. Our findings suggest that switchgrass genomic and life-history variation influences bacterial composition in the rhizosphere, potentially due to host adaptation to local environments.
Collapse
Affiliation(s)
- Jeremy Sutherland
- Department of Plant Pathology and Environmental Microbiology, The Pennsylvania State University, University Park, PA, USA.,Intercollege Graduate Degree Program in Bioinformatics and Genomics, The Pennsylvania State University, University Park, PA, USA.,Department of Biology, The Pennsylvania State University, University Park, PA, USA
| | - Terrence Bell
- Department of Plant Pathology and Environmental Microbiology, The Pennsylvania State University, University Park, PA, USA.,Intercollege Graduate Degree Program in Bioinformatics and Genomics, The Pennsylvania State University, University Park, PA, USA.,Intercollege Graduate Degree Program in Ecology, The Pennsylvania State University, University Park, PA, USA
| | - Ryan V Trexler
- Intercollege Graduate Degree Program in Ecology, The Pennsylvania State University, University Park, PA, USA.,Department of Ecosystem Science and Management, The Pennsylvania State University, University Park, PA, USA
| | - John E Carlson
- Intercollege Graduate Degree Program in Bioinformatics and Genomics, The Pennsylvania State University, University Park, PA, USA.,Department of Ecosystem Science and Management, The Pennsylvania State University, University Park, PA, USA
| | - Jesse R Lasky
- Department of Biology, The Pennsylvania State University, University Park, PA, USA
| |
Collapse
|
20
|
Fanter C, Madelaire C, Genereux DP, van Breukelen F, Levesque D, Hindle A. Epigenomics as a paradigm to understand the nuances of phenotypes. J Exp Biol 2022; 225:274619. [PMID: 35258621 DOI: 10.1242/jeb.243411] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Quantifying the relative importance of genomic and epigenomic modulators of phenotype is a focal challenge in comparative physiology, but progress is constrained by availability of data and analytic methods. Previous studies have linked physiological features to coding DNA sequence, regulatory DNA sequence, and epigenetic state, but few have disentangled their relative contributions or unambiguously distinguished causative effects ('drivers') from correlations. Progress has been limited by several factors, including the classical approach of treating continuous and fluid phenotypes as discrete and static across time and environment, and difficulty in considering the full diversity of mechanisms that can modulate phenotype, such as gene accessibility, transcription, mRNA processing and translation. We argue that attention to phenotype nuance, progressing to association with epigenetic marks and then causal analyses of the epigenetic mechanism, will enable clearer evaluation of the evolutionary path. This would underlie an essential paradigm shift, and power the search for links between genomic and epigenomic features and physiology. Here, we review the growing knowledge base of gene-regulatory mechanisms and describe their links to phenotype, proposing strategies to address widely recognized challenges.
Collapse
Affiliation(s)
- Cornelia Fanter
- School of Life Sciences, University of Nevada Las Vegas, Las Vegas, NV 89154, USA
| | - Carla Madelaire
- School of Life Sciences, University of Nevada Las Vegas, Las Vegas, NV 89154, USA
| | - Diane P Genereux
- Vertebrate Genome Biology, Broad Institute, Cambridge, MA 02142, USA
| | - Frank van Breukelen
- School of Life Sciences, University of Nevada Las Vegas, Las Vegas, NV 89154, USA
| | - Danielle Levesque
- School of Biology and Ecology, University of Maine, Orono, ME 04469, USA
| | - Allyson Hindle
- School of Life Sciences, University of Nevada Las Vegas, Las Vegas, NV 89154, USA
| |
Collapse
|
21
|
Sandoval-Castillo J, Beheregaray LB, Wellenreuther M. Genomic prediction of growth in a commercially, recreationally, and culturally important marine resource, the Australian snapper (Chrysophrys auratus). G3 (BETHESDA, MD.) 2022; 12:jkac015. [PMID: 35100370 PMCID: PMC8896003 DOI: 10.1093/g3journal/jkac015] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/05/2021] [Accepted: 01/07/2022] [Indexed: 06/14/2023]
Abstract
Growth is one of the most important traits of an organism. For exploited species, this trait has ecological and evolutionary consequences as well as economical and conservation significance. Rapid changes in growth rate associated with anthropogenic stressors have been reported for several marine fishes, but little is known about the genetic basis of growth traits in teleosts. We used reduced genome representation data and genome-wide association approaches to identify growth-related genetic variation in the commercially, recreationally, and culturally important Australian snapper (Chrysophrys auratus, Sparidae). Based on 17,490 high-quality single-nucleotide polymorphisms and 363 individuals representing extreme growth phenotypes from 15,000 fish of the same age and reared under identical conditions in a sea pen, we identified 100 unique candidates that were annotated to 51 proteins. We documented a complex polygenic nature of growth in the species that included several loci with small effects and a few loci with larger effects. Overall heritability was high (75.7%), reflected in the high accuracy of the genomic prediction for the phenotype (small vs large). Although the single-nucleotide polymorphisms were distributed across the genome, most candidates (60%) clustered on chromosome 16, which also explains the largest proportion of heritability (16.4%). This study demonstrates that reduced genome representation single-nucleotide polymorphisms and the right bioinformatic tools provide a cost-efficient approach to identify growth-related loci and to describe genomic architectures of complex quantitative traits. Our results help to inform captive aquaculture breeding programs and are of relevance to monitor growth-related evolutionary shifts in wild populations in response to anthropogenic pressures.
Collapse
Affiliation(s)
- Jonathan Sandoval-Castillo
- Molecular Ecology Laboratory, College of Science and Engineering, Flinders University, Bedford Park, SA 5042, Australia
| | - Luciano B Beheregaray
- Molecular Ecology Laboratory, College of Science and Engineering, Flinders University, Bedford Park, SA 5042, Australia
| | - Maren Wellenreuther
- School of Biological Sciences, The New Zealand Institute for Plant and Food Research Limited, Nelson 7010, New Zealand
- Seafood Production Group, The School of Biological Sciences, University of Auckland, Auckland 1010, New Zealand
| |
Collapse
|
22
|
Navigating the pitfalls of applying machine learning in genomics. Nat Rev Genet 2022; 23:169-181. [PMID: 34837041 DOI: 10.1038/s41576-021-00434-9] [Citation(s) in RCA: 83] [Impact Index Per Article: 41.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/28/2021] [Indexed: 11/08/2022]
Abstract
The scale of genetic, epigenomic, transcriptomic, cheminformatic and proteomic data available today, coupled with easy-to-use machine learning (ML) toolkits, has propelled the application of supervised learning in genomics research. However, the assumptions behind the statistical models and performance evaluations in ML software frequently are not met in biological systems. In this Review, we illustrate the impact of several common pitfalls encountered when applying supervised ML in genomics. We explore how the structure of genomics data can bias performance evaluations and predictions. To address the challenges associated with applying cutting-edge ML methods to genomics, we describe solutions and appropriate use cases where ML modelling shows great potential.
Collapse
|
23
|
Wang J, Patel A, Wason JM, Newcombe PJ. Two-stage penalized regression screening to detect biomarker-treatment interactions in randomized clinical trials. Biometrics 2022; 78:141-150. [PMID: 33448327 PMCID: PMC7613856 DOI: 10.1111/biom.13424] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2020] [Revised: 12/16/2020] [Accepted: 12/31/2020] [Indexed: 12/30/2022]
Abstract
High-dimensional biomarkers such as genomics are increasingly being measured in randomized clinical trials. Consequently, there is a growing interest in developing methods that improve the power to detect biomarker-treatment interactions. We adapt recently proposed two-stage interaction detecting procedures in the setting of randomized clinical trials. We also propose a new stage 1 multivariate screening strategy using ridge regression to account for correlations among biomarkers. For this multivariate screening, we prove the asymptotic between-stage independence, required for familywise error rate control, under biomarker-treatment independence. Simulation results show that in various scenarios, the ridge regression screening procedure can provide substantially greater power than the traditional one-biomarker-at-a-time screening procedure in highly correlated data. We also exemplify our approach in two real clinical trial data applications.
Collapse
Affiliation(s)
- Jixiong Wang
- MRC Biostatistics Unit, University of Cambridge, Cambridge, UK
| | - Ashish Patel
- MRC Biostatistics Unit, University of Cambridge, Cambridge, UK
| | - James M.S. Wason
- MRC Biostatistics Unit, University of Cambridge, Cambridge, UK,Population Health Sciences Institute, Newcastle University, Newcastle upon Tyne, UK
| | | |
Collapse
|
24
|
SNP characteristics and validation success in genome wide association studies. Hum Genet 2022; 141:229-238. [PMID: 34981173 PMCID: PMC8855685 DOI: 10.1007/s00439-021-02407-8] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2021] [Accepted: 11/27/2021] [Indexed: 02/03/2023]
Abstract
Genome wide association studies (GWASs) have identified tens of thousands of single nucleotide polymorphisms (SNPs) associated with human diseases and characteristics. A significant fraction of GWAS findings can be false positives. The gold standard for true positives is an independent validation. The goal of this study was to identify SNP features associated with validation success. Summary statistics from the Catalog of Published GWASs were used in the analysis. Since our goal was an analysis of reproducibility, we focused on the diseases/phenotypes targeted by at least 10 GWASs. GWASs were arranged in discovery-validation pairs based on the time of publication, with the discovery GWAS published before validation. We used four definitions of the validation success that differ by stringency. Associations of SNP features with validation success were consistent across the definitions. The strongest predictor of SNP validation was the level of statistical significance in the discovery GWAS. The magnitude of the effect size was associated with validation success in a non-linear manner. SNPs with risk allele frequencies in the range 30-70% showed a higher validation success rate compared to rarer or more common SNPs. Missense, 5'UTR, stop gained, and SNPs located in transcription factor binding sites had a higher validation success rate compared to intergenic, intronic and synonymous SNPs. There was a positive association between validation success and the level of evolutionary conservation of the sites. In addition, validation success was higher when discovery and validation GWASs targeted the same ethnicity. All predictors of validation success remained significant in a multivariate logistic regression model indicating their independent contribution. To conclude, we identified SNP features predicting validation success of GWAS hits. These features can be used to select SNPs for validation and downstream functional studies.
Collapse
|
25
|
Colombo M, Montazeaud G, Viader V, Ecarnot M, Prosperi J, David J, Fort F, Violle C, Freville H. A genome‐wide analysis suggests pleiotropic effects of Green Revolution genes on shade avoidance in wheat. Evol Appl 2022; 15:1594-1604. [PMID: 36330302 PMCID: PMC9624089 DOI: 10.1111/eva.13349] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2021] [Revised: 01/19/2022] [Accepted: 01/20/2022] [Indexed: 11/26/2022] Open
Abstract
A classic example of phenotypic plasticity in plants is the suit of phenotypic responses induced by a change in the ratio of red to far-red light (R∶FR) as a result of shading, also known as the shade avoidance syndrome (SAS). While the adaptive consequences of this syndrome have been extensively discussed in natural ecosystems, how SAS varies within crop populations and how SAS evolved during crop domestication and breeding remain poorly known. In this study, we grew a panel of 180 durum wheat (Triticum turgidum ssp. durum) genotypes spanning diversity from wild, early domesticated, and elite genetic compartments under two light treatments: low R:FR light (shaded treatment) and high R:FR light (unshaded treatment). We first quantified the genetic variability of SAS, here measured as a change in plant height at the seedling stage. We then dissected the genetic basis of this variation through genome-wide association mapping. Genotypes grown in shaded conditions were taller than those grown under unshaded conditions. Interaction between light quality and genotype did not affect plant height. We found six QTLs affecting plant height. Three significantly interacted with light quality among which the well-known Rht1 gene introgressed in elite germplasm during the Green Revolution. Interestingly at three loci, short genotypes systematically expressed reduced SAS, suggesting a positive genetic correlation between plant height and plant height plasticity. Overall, our study sheds light on the evolutionary history of crops and illustrates the relevance of genetic approaches to tackle agricultural challenges.
Collapse
Affiliation(s)
- Michel Colombo
- AGAP Univ Montpellier CIRAD, INRAE Institut Agro Montpellier France
- CEFE Univ. Montpellier Institut Agro CNRS EPHE, IRD Univ Valéry Montpellier France
| | - Germain Montazeaud
- AGAP Univ Montpellier CIRAD, INRAE Institut Agro Montpellier France
- CEFE Univ. Montpellier Institut Agro CNRS EPHE, IRD Univ Valéry Montpellier France
- Department of Ecology and Evolution University of Lausanne 1015 Lausanne Switzerland
| | - Veronique Viader
- AGAP Univ Montpellier CIRAD, INRAE Institut Agro Montpellier France
| | - Martin Ecarnot
- AGAP Univ Montpellier CIRAD, INRAE Institut Agro Montpellier France
| | | | - Jacques David
- AGAP Univ Montpellier CIRAD, INRAE Institut Agro Montpellier France
| | - Florian Fort
- CEFE Univ. Montpellier Institut Agro CNRS EPHE, IRD Univ Valéry Montpellier France
| | - Cyrille Violle
- CEFE Univ. Montpellier CNRS EPHE, IRD Univ Valéry Montpellier France
| | - Helene Freville
- AGAP Univ Montpellier CIRAD, INRAE Institut Agro Montpellier France
| |
Collapse
|
26
|
Crosta M, Nazzicari N, Ferrari B, Pecetti L, Russi L, Romani M, Cabassi G, Cavalli D, Marocco A, Annicchiarico P. Pea Grain Protein Content Across Italian Environments: Genetic Relationship With Grain Yield, and Opportunities for Genome-Enabled Selection for Protein Yield. FRONTIERS IN PLANT SCIENCE 2022; 12:718713. [PMID: 35046967 PMCID: PMC8761899 DOI: 10.3389/fpls.2021.718713] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/01/2021] [Accepted: 11/18/2021] [Indexed: 06/14/2023]
Abstract
Wider pea (Pisum sativum L.) cultivation has great interest for European agriculture, owing to its favorable environmental impact and provision of high-protein feedstuff. This work aimed to investigate the extent of genotype × environment interaction (GEI), genetically based trade-offs and polygenic control for crude protein content and grain yield of pea targeted to Italian environments, and to assess the efficiency of genomic selection (GS) as an alternative to phenotypic selection (PS) to increase protein yield per unit area. Some 306 genotypes belonging to three connected recombinant inbred line (RIL) populations derived from paired crosses between elite cultivars were genotyped through genotyping-by-sequencing and phenotyped for grain yield and protein content on a dry matter basis in three autumn-sown environments of northern or central Italy. Line variation for mean protein content ranged from 21.7 to 26.6%. Purely genetic effects, compared with GEI effects, were over two-fold larger for protein content, and over 2-fold smaller for grain and protein yield per unit area. Grain yield and protein content exhibited no inverse genetic correlation. A genome-wide association study revealed a definite polygenic control not only for grain yield but also for protein content, with small amounts of trait variation accounted for by individual loci. On average, the GS predictive ability for individual RIL populations based on the rrBLUP model (which was selected out of four tested models) using by turns two environments for selection and one for validation was moderately high for protein content (0.53) and moderate for grain yield (0.40) and protein yield (0.41). These values were about halved for inter-environment, inter-population predictions using one RIL population for model construction to predict data of the other populations. The comparison between GS and PS for protein yield based on predicted gains per unit time and similar evaluation costs indicated an advantage of GS for model construction including the target RIL population and, in case of multi-year PS, even for model training based on data of a non-target population. In conclusion, protein content is less challenging than grain yield for phenotypic or genome-enabled improvement, and GS is promising for the simultaneous improvement of both traits.
Collapse
Affiliation(s)
- Margherita Crosta
- Council for Agricultural Research and Economics (CREA), Research Centre for Animal Production and Aquaculture, Lodi, Italy
| | - Nelson Nazzicari
- Council for Agricultural Research and Economics (CREA), Research Centre for Animal Production and Aquaculture, Lodi, Italy
| | - Barbara Ferrari
- Council for Agricultural Research and Economics (CREA), Research Centre for Animal Production and Aquaculture, Lodi, Italy
| | - Luciano Pecetti
- Council for Agricultural Research and Economics (CREA), Research Centre for Animal Production and Aquaculture, Lodi, Italy
| | - Luigi Russi
- Department of Agricultural, Food and Environmental Science, University of Perugia, Perugia, Italy
| | - Massimo Romani
- Council for Agricultural Research and Economics (CREA), Research Centre for Animal Production and Aquaculture, Lodi, Italy
| | - Giovanni Cabassi
- Council for Agricultural Research and Economics (CREA), Research Centre for Animal Production and Aquaculture, Lodi, Italy
| | - Daniele Cavalli
- Council for Agricultural Research and Economics (CREA), Research Centre for Animal Production and Aquaculture, Lodi, Italy
| | - Adriano Marocco
- Department of Sustainable Crop Production, Catholic University of Sacred Heart, Piacenza, Italy
| | - Paolo Annicchiarico
- Council for Agricultural Research and Economics (CREA), Research Centre for Animal Production and Aquaculture, Lodi, Italy
| |
Collapse
|
27
|
Priyanatha C, Torkamaneh D, Rajcan I. Genome-Wide Association Study of Soybean Germplasm Derived From Canadian × Chinese Crosses to Mine for Novel Alleles to Improve Seed Yield and Seed Quality Traits. FRONTIERS IN PLANT SCIENCE 2022; 13:866300. [PMID: 35419011 PMCID: PMC8996715 DOI: 10.3389/fpls.2022.866300] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Accepted: 03/04/2022] [Indexed: 05/16/2023]
Abstract
Genome-wide association study (GWAS) has emerged in the past decade as a viable tool for identifying beneficial alleles from a genomic diversity panel. In an ongoing effort to improve soybean [Glycine max (L.) Merr.], which is the third largest field crop in Canada, a GWAS was conducted to identify novel alleles underlying seed yield and seed quality and agronomic traits. The genomic panel consisted of 200 genotypes including lines derived from several generations of bi-parental crosses between modern Canadian × Chinese cultivars (CD-CH). The genomic diversity panel was field evaluated at two field locations in Ontario in 2019 and 2020. Genotyping-by-sequencing (GBS) was conducted and yielded almost 32 K high-quality SNPs. GWAS was conducted using Fixed and random model Circulating Probability Unification (FarmCPU) model on the following traits: seed yield, seed protein concentration, seed oil concentration, plant height, 100 seed weight, days to maturity, and lodging score that allowed to identify five QTL regions controlling seed yield and seed oil and protein content. A candidate gene search identified a putative gene for each of the three traits. The results of this GWAS study provide insight into potentially valuable genetic resources residing in Chinese modern cultivars that breeders may use to further improve soybean seed yield and seed quality traits.
Collapse
Affiliation(s)
| | - Davoud Torkamaneh
- Département de Phytologie, Université Laval, Québec, QC, Canada
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, QC, Canada
| | - Istvan Rajcan
- Department of Plant Agriculture, University of Guelph, Guelph, ON, Canada
- *Correspondence: Istvan Rajcan,
| |
Collapse
|
28
|
Siekmann D, Jansen G, Zaar A, Kilian A, Fromme FJ, Hackauf B. A Genome-Wide Association Study Pinpoints Quantitative Trait Genes for Plant Height, Heading Date, Grain Quality, and Yield in Rye ( Secale cereale L.). FRONTIERS IN PLANT SCIENCE 2021; 12:718081. [PMID: 34777409 PMCID: PMC8586073 DOI: 10.3389/fpls.2021.718081] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2021] [Accepted: 09/22/2021] [Indexed: 06/03/2023]
Abstract
Rye is the only cross-pollinating Triticeae crop species. Knowledge of rye genes controlling complex-inherited traits is scarce, which, currently, largely disables the genomics assisted introgression of untapped genetic variation from self-incompatible germplasm collections in elite inbred lines for hybrid breeding. We report on the first genome-wide association study (GWAS) in rye based on the phenotypic evaluation of 526 experimental hybrids for plant height, heading date, grain quality, and yield in 2 years and up to 19 environments. We established a cross-validated NIRS calibration model as a fast, effective, and robust analytical method to determine grain quality parameters. We observed phenotypic plasticity in plant height and tiller number as a resource use strategy of rye under drought and identified increased grain arabinoxylan content as a striking phenotype in osmotically stressed rye. We used DArTseq™ as a genotyping-by-sequencing technology to reduce the complexity of the rye genome. We established a novel high-density genetic linkage map that describes the position of almost 19k markers and that allowed us to estimate a low genome-wide LD based on the assessed genetic diversity in elite germplasm. We analyzed the relationship between plant height, heading date, agronomic, as well as grain quality traits, and genotype based on 20k novel single-nucleotide polymorphism markers. In addition, we integrated the DArTseq™ markers in the recently established 'Lo7' reference genome assembly. We identified cross-validated SNPs in 'Lo7' protein-coding genes associated with all traits studied. These include associations of the WUSCHEL-related homeobox transcription factor DWT1 and grain yield, the DELLA protein gene SLR1 and heading date, the Ethylene overproducer 1-like protein gene ETOL1 and thousand-grain weight, protein and starch content, as well as the Lectin receptor kinase SIT2 and plant height. A Leucine-rich repeat receptor protein kinase and a Xyloglucan alpha-1,6-xylosyltransferase count among the cross-validated genes associated with water-extractable arabinoxylan content. This study demonstrates the power of GWAS, hybrid breeding, and the reference genome sequence in rye genetics research to dissect and identify the function of genes shaping genetic diversity in agronomic and grain quality traits of rye. The described links between genetic causes and phenotypic variation will accelerate genomics-enabled rye improvement.
Collapse
Affiliation(s)
- Dörthe Siekmann
- Julius Kühn Institute, Federal Research Centre for Cultivated Plants, Institute for Breeding Research on Agricultural Crops, Sanitz, Germany
- HYBRO Saatzucht GmbH & Co. KG, Schenkenberg, Germany
| | - Gisela Jansen
- Julius Kühn Institute, Federal Research Centre for Cultivated Plants, Institute for Resistance Research and Stress Tolerance, Sanitz, Germany
| | - Anne Zaar
- Julius Kühn Institute, Federal Research Centre for Cultivated Plants, Institute for Resistance Research and Stress Tolerance, Sanitz, Germany
| | | | | | - Bernd Hackauf
- Julius Kühn Institute, Federal Research Centre for Cultivated Plants, Institute for Breeding Research on Agricultural Crops, Sanitz, Germany
| |
Collapse
|
29
|
Abstract
We present a comprehensive statistical framework to analyze data from genome-wide association studies of polygenic traits, producing interpretable findings while controlling the false discovery rate. In contrast with standard approaches, our method can leverage sophisticated multivariate algorithms but makes no parametric assumptions about the unknown relation between genotypes and phenotype. Instead, we recognize that genotypes can be considered as a random sample from an appropriate model, encapsulating our knowledge of genetic inheritance and human populations. This allows the generation of imperfect copies (knockoffs) of these variables that serve as ideal negative controls, correcting for linkage disequilibrium and accounting for unknown population structure, which may be due to diverse ancestries or familial relatedness. The validity and effectiveness of our method are demonstrated by extensive simulations and by applications to the UK Biobank data. These analyses confirm our method is powerful relative to state-of-the-art alternatives, while comparisons with other studies validate most of our discoveries. Finally, fast software is made available for researchers to analyze Biobank-scale datasets.
Collapse
|
30
|
Sesia M, Bates S, Candès E, Marchini J, Sabatti C. False discovery rate control in genome-wide association studies with population structure. Proc Natl Acad Sci U S A 2021; 118:e2105841118. [PMID: 34580220 PMCID: PMC8501795 DOI: 10.1073/pnas.2105841118] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/18/2021] [Indexed: 12/25/2022] Open
Abstract
We present a comprehensive statistical framework to analyze data from genome-wide association studies of polygenic traits, producing interpretable findings while controlling the false discovery rate. In contrast with standard approaches, our method can leverage sophisticated multivariate algorithms but makes no parametric assumptions about the unknown relation between genotypes and phenotype. Instead, we recognize that genotypes can be considered as a random sample from an appropriate model, encapsulating our knowledge of genetic inheritance and human populations. This allows the generation of imperfect copies (knockoffs) of these variables that serve as ideal negative controls, correcting for linkage disequilibrium and accounting for unknown population structure, which may be due to diverse ancestries or familial relatedness. The validity and effectiveness of our method are demonstrated by extensive simulations and by applications to the UK Biobank data. These analyses confirm our method is powerful relative to state-of-the-art alternatives, while comparisons with other studies validate most of our discoveries. Finally, fast software is made available for researchers to analyze Biobank-scale datasets.
Collapse
Affiliation(s)
- Matteo Sesia
- Department of Data Sciences and Operations, University of Southern California, Los Angeles, CA 90089;
| | - Stephen Bates
- Department of Statistics, University of California, Berkeley, CA 94720
- Department of Electrical Engineering and Computer Science, University of California, Berkeley, CA 94720
| | - Emmanuel Candès
- Department of Statistics, Stanford University, Stanford, CA 94305;
- Department of Mathematics, Stanford University, Stanford, CA 94305
| | | | - Chiara Sabatti
- Department of Statistics, Stanford University, Stanford, CA 94305
- Department of Biomedical Data Sciences, Stanford University, Stanford, CA 94305
| |
Collapse
|
31
|
Wallin J, Bogdan M, Szulc PA, Doerge RW, Siegmund DO. Ghost QTL and hotspots in experimental crosses: novel approach for modeling polygenic effects. Genetics 2021; 217:6067404. [PMID: 33789342 DOI: 10.1093/genetics/iyaa041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2020] [Accepted: 12/10/2020] [Indexed: 11/14/2022] Open
Abstract
Ghost quantitative trait loci (QTL) are the false discoveries in QTL mapping, that arise due to the "accumulation" of the polygenic effects, uniformly distributed over the genome. The locations on the chromosome that are strongly correlated with the total of the polygenic effects depend on a specific sample correlation structure determined by the genotypes at all loci. The problem is particularly severe when the same genotypes are used to study multiple QTL, e.g. using recombinant inbred lines or studying the expression QTL. In this case, the ghost QTL phenomenon can lead to false hotspots, where multiple QTL show apparent linkage to the same locus. We illustrate the problem using the classic backcross design and suggest that it can be solved by the application of the extended mixed effect model, where the random effects are allowed to have a nonzero mean. We provide formulas for estimating the thresholds for the corresponding t-test statistics and use them in the stepwise selection strategy, which allows for a simultaneous detection of several QTL. Extensive simulation studies illustrate that our approach eliminates ghost QTL/false hotspots, while preserving a high power of true QTL detection.
Collapse
Affiliation(s)
- Jonas Wallin
- Department of Statistics, Lund University, 220 07 Lund, Sweden
| | - Małgorzata Bogdan
- Department of Statistics, Lund University, 220 07 Lund, Sweden.,Department of Mathematics, Institute of Mathematics, University of Wroclaw, 50-137 Wroclaw, Poland
| | - Piotr A Szulc
- Department of Mathematics, Institute of Mathematics, University of Wroclaw, 50-137 Wroclaw, Poland
| | - R W Doerge
- Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA 15 213, USA.,Department of Statistics and Data Science, Carnegie Mellon University, Pittsburgh, PA 15 213, USA
| | - David O Siegmund
- Department of Statistics, Stanford University, Stanford, CA 94 305, USA
| |
Collapse
|
32
|
BOGOMOLOV MARINA, PETERSON CHRISTINEB, BENJAMINI YOAV, SABATTI CHIARA. Hypotheses on a tree: new error rates and testing strategies. Biometrika 2021; 108:575-590. [PMID: 36825068 PMCID: PMC9945647 DOI: 10.1093/biomet/asaa086] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
We introduce a multiple testing procedure that controls global error rates at multiple levels of resolution. Conceptually, we frame this problem as the selection of hypotheses that are organized hierarchically in a tree structure. We describe a fast algorithm and prove that it controls relevant error rates given certain assumptions on the dependence between the p-values. Through simulations, we demonstrate that the proposed procedure provides the desired guarantees under a range of dependency structures and that it has the potential to gain power over alternative methods. Finally, we apply the method to studies on the genetic regulation of gene expression across multiple tissues and on the relation between the gut microbiome and colorectal cancer.
Collapse
Affiliation(s)
- MARINA BOGOMOLOV
- The William Davidson Faculty of Industrial Engineering and Management, Technion-Israel Institute of Technology, Technion City, Haifa 3200003, Israel
| | - CHRISTINE B. PETERSON
- Department of Biostatistics, Division of Basic Science Research, The University of Texas, MD Anderson Cancer Center, Houston, Texas 77030, U.S.A
| | - YOAV BENJAMINI
- Department of Statistics and Operations Research, Tel-Aviv University, P.O. Box 39040, Tel-Aviv 6997801, Israel
| | - CHIARA SABATTI
- Department of Statistics, Stanford University, 50 Governor’s Lane, Stanford, California 94305, U.S.A
| |
Collapse
|
33
|
Panahabadi R, Ahmadikhah A, McKee LS, Ingvarsson PK, Farrokhi N. Genome-Wide Association Mapping of Mixed Linkage (1,3;1,4)-β-Glucan and Starch Contents in Rice Whole Grain. FRONTIERS IN PLANT SCIENCE 2021; 12:665745. [PMID: 34512678 PMCID: PMC8424012 DOI: 10.3389/fpls.2021.665745] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/10/2021] [Accepted: 07/28/2021] [Indexed: 05/27/2023]
Abstract
The glucan content of rice is a key factor defining its nutritional and economic value. Starch and its derivatives have many industrial applications such as in fuel and material production. Non-starch glucans such as (1,3;1,4)-β-D-glucan (mixed-linkage β-glucan, MLG) have many benefits in human health, including lowering cholesterol, boosting the immune system, and modulating the gut microbiome. In this study, the genetic variability of MLG and starch contents were analyzed in rice (Oryza sativa L.) whole grain, by performing a new quantitative analysis of the polysaccharide content of rice grains. The 197 rice accessions investigated had an average MLG content of 252 μg/mg, which was negatively correlated with the grain starch content. A new genome-wide association study revealed seven significant quantitative trait loci (QTLs) associated with the MLG content and two QTLs associated with the starch content in rice whole grain. Novel genes associated with the MLG content were a hexose transporter and anthocyanidin 5,3-O-glucosyltransferase. Also, the novel gene associated with the starch content was a nodulin-like domain. The data pave the way for a better understanding of the genes involved in determining both MLG and starch contents in rice grains and should facilitate future plant breeding programs.
Collapse
Affiliation(s)
- Rahele Panahabadi
- Department of Plant Science and Biotechnology, Faculty of Life Sciences and Biotechnology, Shahid Beheshti University, Tehran, Iran
- Division of Glycoscience, Department of Chemistry, KTH Royal Institute of Technology, AlbaNova University Centre, Stockholm, Sweden
| | - Asadollah Ahmadikhah
- Department of Plant Science and Biotechnology, Faculty of Life Sciences and Biotechnology, Shahid Beheshti University, Tehran, Iran
| | - Lauren S. McKee
- Division of Glycoscience, Department of Chemistry, KTH Royal Institute of Technology, AlbaNova University Centre, Stockholm, Sweden
- Wallenberg Wood Science Centre, Stockholm, Sweden
| | - Pär K. Ingvarsson
- Linnean Centre for Plant Biology, Department of Plant Biology, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Naser Farrokhi
- Department of Plant Science and Biotechnology, Faculty of Life Sciences and Biotechnology, Shahid Beheshti University, Tehran, Iran
| |
Collapse
|
34
|
Katsevich E, Sabatti C, Bogomolov M. Filtering the rejection set while preserving false discovery rate control. J Am Stat Assoc 2021; 118:165-176. [PMID: 37346227 PMCID: PMC10281705 DOI: 10.1080/01621459.2021.1920958] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2020] [Revised: 04/14/2021] [Accepted: 04/18/2021] [Indexed: 12/28/2022]
Abstract
Scientific hypotheses in a variety of applications have domain-specific structures, such as the tree structure of the International Classification of Diseases (ICD), the directed acyclic graph structure of the Gene Ontology (GO), or the spatial structure in genome-wide association studies. In the context of multiple testing, the resulting relationships among hypotheses can create redundancies among rejections that hinder interpretability. This leads to the practice of filtering rejection sets obtained from multiple testing procedures, which may in turn invalidate their inferential guarantees. We propose Focused BH, a simple, flexible, and principled methodology to adjust for the application of any pre-specified filter. We prove that Focused BH controls the false discovery rate under various conditions, including when the filter satisfies an intuitive monotonicity property and the p-values are positively dependent. We demonstrate in simulations that Focused BH performs well across a variety of settings, and illustrate this method's practical utility via analyses of real datasets based on ICD and GO.
Collapse
Affiliation(s)
| | - Chiara Sabatti
- Departments of Statistics and Biomedical Data Science, Stanford University
| | - Marina Bogomolov
- Faculty of Industrial Engineering and Management, Technion - Israel Institute of Technology
| |
Collapse
|
35
|
Abstract
It is clear, based on a deep scientific literature base, that genetic and genomic factors play significant roles in determining a wide range of sport and exercise characteristics including exercise endurance capacity, strength, daily physical activity levels, and trainability of both endurance and strength. Although the research field of exercise systems genetics has rapidly expanded over the past two decades, many researchers publishing in this field are not extensively trained in molecular biology or genomics techniques, sometimes creating gaps in generating high-quality and cutting-edge research for publication. As current or former Associate Editors for Medicine and Science in Sports and Exercise that have handled the majority of exercise genetics articles for Medicine and Science in Sports and Exercise in the past 15 yr, we have observed a large number of scientific manuscripts submitted for publication review that have exhibited significant flaws preventing their publication; flaws that often directly stem from a lack of knowledge regarding the "state-of-the-art" methods and accepted literature base that is rapidly changing as the field evolves. The purpose of this commentary is to provide researchers-especially those coming from a nongenetics background attempting to publish in the exercise system genetics area-with recommendations regarding best-practice research standards and data analysis in the field of exercise systems genetics, to strengthen the overall literature in this important and evolving field of research.
Collapse
Affiliation(s)
- J Timothy Lightfoot
- Department of Health and Kinesiology and the Sydney and JL Huffines Institute for Sports Medicine and Human Performance, Texas A&M University, College Station, TX
| | - Stephen M Roth
- Department of Kinesiology, University of Maryland, College Park, MD
| | - Monica J Hubal
- Department of Kinesiology, Indiana University-Purdue University at Indianapolis, Indianapolis, IN
| |
Collapse
|
36
|
Kafle OP, Cheng S, Ma M, Li P, Cheng B, Zhang L, Wen Y, Liang C, Qi X, Zhang F. Identifying insomnia-related chemicals through integrative analysis of genome-wide association studies and chemical-genes interaction information. Sleep 2021; 43:5805199. [PMID: 32170308 DOI: 10.1093/sleep/zsaa042] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2019] [Revised: 03/02/2020] [Indexed: 12/30/2022] Open
Abstract
STUDY OBJECTIVES Insomnia is a common sleep disorder and constitutes a major issue in modern society. We provide new clues for revealing the association between environmental chemicals and insomnia. METHODS Three genome-wide association studies (GWAS) summary datasets of insomnia (n = 113,006, n = 1,331,010, and n = 453,379, respectively) were driven from the UK Biobank, 23andMe, and deCODE. The chemical-gene interaction dataset was downloaded from the Comparative Toxicogenomics Database. First, we conducted a meta-analysis of the three datasets of insomnia using the METAL software. Using the result of meta-analysis, transcriptome-wide association studies were performed to calculate the expression association testing statistics of insomnia. Then chemical-related gene set enrichment analysis (GSEA) was used to explore the association between chemicals and insomnia. RESULTS For GWAS meta-analysis dataset of insomnia, we identified 42 chemicals associated with insomnia in brain tissue (p < 0.05) by GSEA. We detected five important chemicals such as pinosylvin (p = 0.0128), bromobenzene (p = 0.0134), clonidine (p = 0.0372), gabapentin (p = 0.0372), and melatonin (p = 0.0404) which are directly associated with insomnia. CONCLUSION Our study results provide new clues for revealing the roles of environmental chemicals in the development of insomnia.
Collapse
Affiliation(s)
- Om Prakash Kafle
- Key Laboratory of Trace Elements and Endemic Diseases of National Health and Family Planning Commission, School of Public Health, Health Science Center, Xi'an Jiaotong University, Xi'an, P. R. China
| | - Shiqiang Cheng
- Key Laboratory of Trace Elements and Endemic Diseases of National Health and Family Planning Commission, School of Public Health, Health Science Center, Xi'an Jiaotong University, Xi'an, P. R. China
| | - Mei Ma
- Key Laboratory of Trace Elements and Endemic Diseases of National Health and Family Planning Commission, School of Public Health, Health Science Center, Xi'an Jiaotong University, Xi'an, P. R. China
| | - Ping Li
- Key Laboratory of Trace Elements and Endemic Diseases of National Health and Family Planning Commission, School of Public Health, Health Science Center, Xi'an Jiaotong University, Xi'an, P. R. China
| | - Bolun Cheng
- Key Laboratory of Trace Elements and Endemic Diseases of National Health and Family Planning Commission, School of Public Health, Health Science Center, Xi'an Jiaotong University, Xi'an, P. R. China
| | - Lu Zhang
- Key Laboratory of Trace Elements and Endemic Diseases of National Health and Family Planning Commission, School of Public Health, Health Science Center, Xi'an Jiaotong University, Xi'an, P. R. China
| | - Yan Wen
- Key Laboratory of Trace Elements and Endemic Diseases of National Health and Family Planning Commission, School of Public Health, Health Science Center, Xi'an Jiaotong University, Xi'an, P. R. China
| | - Chujun Liang
- Key Laboratory of Trace Elements and Endemic Diseases of National Health and Family Planning Commission, School of Public Health, Health Science Center, Xi'an Jiaotong University, Xi'an, P. R. China
| | - Xin Qi
- Key Laboratory of Trace Elements and Endemic Diseases of National Health and Family Planning Commission, School of Public Health, Health Science Center, Xi'an Jiaotong University, Xi'an, P. R. China
| | - Feng Zhang
- Key Laboratory of Trace Elements and Endemic Diseases of National Health and Family Planning Commission, School of Public Health, Health Science Center, Xi'an Jiaotong University, Xi'an, P. R. China
| |
Collapse
|
37
|
Mai TT, Turner P, Corander J. Boosting heritability: estimating the genetic component of phenotypic variation with multiple sample splitting. BMC Bioinformatics 2021; 22:164. [PMID: 33773584 PMCID: PMC8004405 DOI: 10.1186/s12859-021-04079-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2021] [Accepted: 03/15/2021] [Indexed: 11/29/2022] Open
Abstract
Background Heritability is a central measure in genetics quantifying how much of the variability observed in a trait is attributable to genetic differences. Existing methods for estimating heritability are most often based on random-effect models, typically for computational reasons. The alternative of using a fixed-effect model has received much more limited attention in the literature. Results In this paper, we propose a generic strategy for heritability inference, termed as “boosting heritability”, by combining the advantageous features of different recent methods to produce an estimate of the heritability with a high-dimensional linear model. Boosting heritability uses in particular a multiple sample splitting strategy which leads in general to a stable and accurate estimate. We use both simulated data and real antibiotic resistance data from a major human pathogen, Sptreptococcus pneumoniae, to demonstrate the attractive features of our inference strategy. Conclusions Boosting is shown to offer a reliable and practically useful tool for inference about heritability.
Collapse
Affiliation(s)
- The Tien Mai
- Oslo Centre for Biostatistics and Epidemiology, Department of Biostatistics, University of Oslo, Oslo, Norway.
| | - Paul Turner
- Cambodia-Oxford Medical Research Unit, Angkor Hospital for Children, Siem Reap, Cambodia.,Centre for Tropical Medicine and Global Health, Nuffield Department of Medicine, University of Oxford, Oxford, UK
| | - Jukka Corander
- Oslo Centre for Biostatistics and Epidemiology, Department of Biostatistics, University of Oslo, Oslo, Norway.,Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland
| |
Collapse
|
38
|
Dang JT, Dang TT, Wine E, Dicken B, Madsen K, Laffin M. The Genetics of Postoperative Recurrence in Crohn Disease: A Systematic Review, Meta-analysis, and Framework for Future Work. CROHN'S & COLITIS 360 2021; 3:otaa094. [PMID: 36778938 PMCID: PMC9802308 DOI: 10.1093/crocol/otaa094] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/23/2020] [Indexed: 12/12/2022] Open
Abstract
Background Recurrence following abdominal surgery in Crohn disease is over 50%. The impact of genetics on postoperative recurrence is not well defined. Methods A literature search was conducted where inclusion required an assessment, by genotype, of postoperative recurrence. The primary endpoint was odds of surgical recurrence. Results Twenty-eight studies identified a total of 6715 patients. Thirteen loci were identified as modifying the risk of recurrence. NOD2 was identified as a risk factor for recurrence by multiple works (cumulative odds ratio: 1.64, P = 0.003). Conclusions A NOD2 risk allele is associated with recurrence following surgery in Crohn disease. Progress in this area will require standardized reporting in future works.
Collapse
Affiliation(s)
- Jerry T Dang
- Department of Surgery, University of Alberta, Edmonton, Alberta, Canada
| | - ThucNhi T Dang
- Department of Medicine, University of Alberta, Edmonton, Alberta, Canada
| | - Eytan Wine
- Department of Pediatrics, University of Alberta, Edmonton, Alberta, Canada
| | - Bryan Dicken
- Department of Surgery, University of Alberta, Edmonton, Alberta, Canada
| | - Karen Madsen
- Department of Medicine, University of Alberta, Edmonton, Alberta, Canada
| | - Michael Laffin
- Department of Surgery, University of Alberta, Edmonton, Alberta, Canada,Address correspondence to: Michael Laffin, MD, PhD, Department of Surgery, University of Alberta, University of Alberta Hospital, 8440 112 Street NW, Edmonton, AB T6G 2B7, Canada ()
| |
Collapse
|
39
|
Tibbs Cortes L, Zhang Z, Yu J. Status and prospects of genome-wide association studies in plants. THE PLANT GENOME 2021; 14:e20077. [PMID: 33442955 DOI: 10.1002/tpg2.20077] [Citation(s) in RCA: 138] [Impact Index Per Article: 46.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/05/2020] [Accepted: 11/18/2020] [Indexed: 05/22/2023]
Abstract
Genome-wide association studies (GWAS) have developed into a powerful and ubiquitous tool for the investigation of complex traits. In large part, this was fueled by advances in genomic technology, enabling us to examine genome-wide genetic variants across diverse genetic materials. The development of the mixed model framework for GWAS dramatically reduced the number of false positives compared with naïve methods. Building on this foundation, many methods have since been developed to increase computational speed or improve statistical power in GWAS. These methods have allowed the detection of genomic variants associated with either traditional agronomic phenotypes or biochemical and molecular phenotypes. In turn, these associations enable applications in gene cloning and in accelerated crop breeding through marker assisted selection or genetic engineering. Current topics of investigation include rare-variant analysis, synthetic associations, optimizing the choice of GWAS model, and utilizing GWAS results to advance knowledge of biological processes. Ongoing research in these areas will facilitate further advances in GWAS methods and their applications.
Collapse
Affiliation(s)
| | - Zhiwu Zhang
- Department of Crop and Soil Sciences, Washington State University, Pullman, WA, 99164, USA
| | - Jianming Yu
- Department of Agronomy, Iowa State University, Ames, IA, 50010, USA
| |
Collapse
|
40
|
Chen Z, Boehnke M, Wen X, Mukherjee B. Revisiting the genome-wide significance threshold for common variant GWAS. G3 (BETHESDA, MD.) 2021; 11:jkaa056. [PMID: 33585870 PMCID: PMC8022962 DOI: 10.1093/g3journal/jkaa056] [Citation(s) in RCA: 39] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/18/2020] [Accepted: 11/05/2020] [Indexed: 11/23/2022]
Abstract
Over the last decade, GWAS meta-analyses have used a strict P-value threshold of 5 × 10-8 to classify associations as significant. Here, we use our current understanding of frequently studied traits including lipid levels, height, and BMI to revisit this genome-wide significance threshold. We compare the performance of studies using the P = 5 × 10-8 threshold in terms of true and false positive rate to other multiple testing strategies: (1) less stringent P-value thresholds, (2) controlling the FDR with the Benjamini-Hochberg and Benjamini-Yekutieli procedure, and (3) controlling the Bayesian FDR with posterior probabilities. We applied these procedures to re-analyze results from the Global Lipids and GIANT GWAS meta-analysis consortia and supported them with extensive simulation that mimics the empirical data. We observe in simulated studies with sample sizes ∼20,000 and >120,000 that relaxing the P-value threshold to 5 × 10-7 increased discovery at the cost of 18% and 8% of additional loci being false positive results, respectively. FDR and Bayesian FDR are well controlled for both sample sizes with a few exceptions that disappear under a less stringent definition of true positives and the two approaches yield similar results. Our work quantifies the value of using a relaxed P-value threshold in large studies to increase their true positive discovery but also show the excess false positive rates due to such actions in modest-sized studies. These results may guide investigators considering different thresholds in replication studies and downstream work such as gene-set enrichment or pathway analysis. Finally, we demonstrate the viability of FDR-controlling procedures in GWAS.
Collapse
Affiliation(s)
- Zhongsheng Chen
- Department of Biostatistics and Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI 48109-2029, USA
| | - Michael Boehnke
- Department of Biostatistics and Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI 48109-2029, USA
| | - Xiaoquan Wen
- Department of Biostatistics and Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI 48109-2029, USA
| | - Bhramar Mukherjee
- Department of Biostatistics and Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI 48109-2029, USA
| |
Collapse
|
41
|
Identification and Characterization of Serum microRNAs as Biomarkers for Human Disc Degeneration: An RNA Sequencing Analysis. Diagnostics (Basel) 2020; 10:diagnostics10121063. [PMID: 33302347 PMCID: PMC7762572 DOI: 10.3390/diagnostics10121063] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2020] [Revised: 12/01/2020] [Accepted: 12/02/2020] [Indexed: 12/03/2022] Open
Abstract
Circulating microRNAs (miRNAs) have been associated with various degenerative diseases, including intervertebral disc (IVD) degeneration. Lumbar disc herniation (LDH) often occurs in young patients, although the underlying mechanisms are poorly understood. The aim of this work was to generate RNA deep sequencing data of peripheral blood samples from patients suffering from LDH, identify circulating miRNAs, and analyze them using bioinformatics applications. Serum was collected from 10 patients with LDH (Disc Degeneration Group); 10 patients without LDH served as the Control Group. RNA sequencing analysis identified 73 differential circulating miRNAs (p < 0.05) between the Disc Degeneration Group and Control Group. Gene ontology enrichment analysis (p < 0.05) showed that these differentially expressed miRNAs were associated with extracellular matrix, damage reactions, inflammatory reactions, and regulation of apoptosis. Kyoto Encyclopedia of Genes and Genomes analysis showed that the differentially expressed genes were involved in diverse signaling pathways. The profile of miR-766-3p, miR-6749-3p, and miR-4632-5p serum miRNAs was significantly enriched (p < 0.05) in multiple pathways associated with IVD degeneration. miR-766-3p, miR-6749-3p, and miR-4632-5p signature from serum may serve as a noninvasive diagnostic biomarker for LHD manifestation of IVD degeneration. Furthermore, several dysregulated miRNAs may be involved in the pathogenesis of IVD degeneration. Further study is needed to confirm the functional role of the identified miRNAs.
Collapse
|
42
|
Nunes JRS, Pértille F, Andrade SCS, Perazza CA, Villela PMS, Almeida-Val VMF, Gao ZX, Coutinho LL, Hilsdorf AWS. Genome-wide association study reveals genes associated with the absence of intermuscular bones in tambaqui (Colossoma macropomum). Anim Genet 2020; 51:899-909. [PMID: 33006182 DOI: 10.1111/age.13001] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/24/2020] [Indexed: 01/21/2023]
Abstract
The presence of intermuscular bones in fisheries products limits the consumption and commercialization potential of many fish species, including tambaqui (Colossoma macropomum). These bones have caused medical emergencies and are an undesirable characteristic for fish farming because their removal is labor-intensive during fish processing. Despite the difficulty in identifying genes related to the lack of intermuscular bone in diverse species of fish, the discovery of individuals lacking intermuscular bones in a Neotropical freshwater characiform fish has provided a unique opportunity to delve into the genetic mechanisms underlying the pathways of intermuscular bone formation. In this study, we carried out a GWAS among boneless and wt tambaqui populations to identify markers associated with a lack of intermuscular bone. After analyzing 11 416 SNPs in 360 individuals (12 boneless and 348 bony), we report 675 significant (Padj < 0.003) associations for this trait. Of those, 13 associations were located near candidate genes related to the reduction of bone mass, promotion of bone formation, inhibition of bone resorption, central control of bone remodeling, bone mineralization and other related functions. To the best of our knowledge, for the first time, we have successfully identified genes related to a lack of intermuscular bones using GWAS in a non-model species.
Collapse
Affiliation(s)
- J R S Nunes
- Nature and Culture Institute, Federal University of Amazon (UFAM), Benjamin Constant, Amazonas, 69630-000, Brazil.,Animal Science Department, University of São Paulo (USP)/Luiz de Queiroz College of Agriculture (ESALQ), Piracicaba, São Paulo, 13418-900, Brazil
| | - F Pértille
- Animal Science Department, University of São Paulo (USP)/Luiz de Queiroz College of Agriculture (ESALQ), Piracicaba, São Paulo, 13418-900, Brazil.,Avian Behavioural Genomics and Physiology Group, IFM Biology, Linköping University, Linköping, 58 183, Sweden
| | - S C S Andrade
- Genetics and Evolutionary Biology Department, University of São Paulo (USP)/Bioscience Institute (IB), São Paulo, São Paulo, 05508-090, Brazil
| | - C A Perazza
- Unit of Biotechnology, University of Mogi das Cruzes, Mogi das Cruzes, São Paulo, 08780-911, Brazil
| | - P M S Villela
- Animal Science Department, University of São Paulo (USP)/Luiz de Queiroz College of Agriculture (ESALQ), Piracicaba, São Paulo, 13418-900, Brazil
| | - V M F Almeida-Val
- Brazilian National Institute for Research of the Amazon, Laboratory of Ecophysiology and Molecular Evolution, Manaus, Amazonas, 69067-375, Brazil
| | - Z-X Gao
- College of Fisheries, Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education/Key Laboratory of Freshwater Animal Breeding, Ministry of Agriculture, Huazhong Agricultural University, Hongshan District, Wuhan, 430070, China
| | - L L Coutinho
- Animal Science Department, University of São Paulo (USP)/Luiz de Queiroz College of Agriculture (ESALQ), Piracicaba, São Paulo, 13418-900, Brazil
| | - A W S Hilsdorf
- Unit of Biotechnology, University of Mogi das Cruzes, Mogi das Cruzes, São Paulo, 08780-911, Brazil
| |
Collapse
|
43
|
Powell Doherty RD, Liao H, Satsangi JJ, Ternette N. Extended Analysis Identifies Drug-Specific Association of 2 Distinct HLA Class II Haplotypes for Development of Immunogenicity to Adalimumab and Infliximab. Gastroenterology 2020; 159:784-787. [PMID: 32275970 DOI: 10.1053/j.gastro.2020.03.073] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/26/2020] [Revised: 03/26/2020] [Accepted: 03/30/2020] [Indexed: 01/07/2023]
Affiliation(s)
- Rebecca D Powell Doherty
- Centre for Cellular and Molecular Physiology, Nuffield Department of Medicine, University of Oxford, Oxford, UK; Translational Gastroenterology Unit, Experimental Medicine, Nuffield Department of Medicine, University of Oxford, Oxford, UK
| | - Hanqing Liao
- Centre for Cellular and Molecular Physiology, Nuffield Department of Medicine, University of Oxford, Oxford, UK; Jenner Institute, Nuffield Department of Medicine, University of Oxford, Oxford, UK
| | - Jack J Satsangi
- Translational Gastroenterology Unit, Experimental Medicine, Nuffield Department of Medicine, University of Oxford, Oxford, UK
| | - Nicola Ternette
- Centre for Cellular and Molecular Physiology, Nuffield Department of Medicine, University of Oxford, Oxford, UK; Jenner Institute, Nuffield Department of Medicine, University of Oxford, Oxford, UK.
| |
Collapse
|
44
|
Lees JA, Mai TT, Galardini M, Wheeler NE, Horsfield ST, Parkhill J, Corander J. Improved Prediction of Bacterial Genotype-Phenotype Associations Using Interpretable Pangenome-Spanning Regressions. mBio 2020; 11:e01344-20. [PMID: 32636251 PMCID: PMC7343994 DOI: 10.1128/mbio.01344-20] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2020] [Accepted: 06/05/2020] [Indexed: 12/19/2022] Open
Abstract
Discovery of genetic variants underlying bacterial phenotypes and the prediction of phenotypes such as antibiotic resistance are fundamental tasks in bacterial genomics. Genome-wide association study (GWAS) methods have been applied to study these relations, but the plastic nature of bacterial genomes and the clonal structure of bacterial populations creates challenges. We introduce an alignment-free method which finds sets of loci associated with bacterial phenotypes, quantifies the total effect of genetics on the phenotype, and allows accurate phenotype prediction, all within a single computationally scalable joint modeling framework. Genetic variants covering the entire pangenome are compactly represented by extended DNA sequence words known as unitigs, and model fitting is achieved using elastic net penalization, an extension of standard multiple regression. Using an extensive set of state-of-the-art bacterial population genomic data sets, we demonstrate that our approach performs accurate phenotype prediction, comparable to popular machine learning methods, while retaining both interpretability and computational efficiency. Compared to those of previous approaches, which test each genotype-phenotype association separately for each variant and apply a significance threshold, the variants selected by our joint modeling approach overlap substantially.IMPORTANCE Being able to identify the genetic variants responsible for specific bacterial phenotypes has been the goal of bacterial genetics since its inception and is fundamental to our current level of understanding of bacteria. This identification has been based primarily on painstaking experimentation, but the availability of large data sets of whole genomes with associated phenotype metadata promises to revolutionize this approach, not least for important clinical phenotypes that are not amenable to laboratory analysis. These models of phenotype-genotype association can in the future be used for rapid prediction of clinically important phenotypes such as antibiotic resistance and virulence by rapid-turnaround or point-of-care tests. However, despite much effort being put into adapting genome-wide association study (GWAS) approaches to cope with bacterium-specific problems, such as strong population structure and horizontal gene exchange, current approaches are not yet optimal. We describe a method that advances methodology for both association and generation of portable prediction models.
Collapse
Affiliation(s)
- John A Lees
- MRC Centre for Global Infectious Disease Analysis, Department of Infectious Disease Epidemiology, Imperial College London, London, United Kingdom
| | - T Tien Mai
- Oslo Centre for Biostatistics and Epidemiology, Department of Biostatistics, University of Oslo, Oslo, Norway
| | - Marco Galardini
- Biological Design Center, Boston University, Boston, Massachusetts, USA
| | - Nicole E Wheeler
- Centre for Genomic Pathogen Surveillance, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, United Kingdom
| | - Samuel T Horsfield
- MRC Centre for Global Infectious Disease Analysis, Department of Infectious Disease Epidemiology, Imperial College London, London, United Kingdom
| | - Julian Parkhill
- Department of Veterinary Medicine, University of Cambridge, Cambridge, United Kingdom
| | - Jukka Corander
- Oslo Centre for Biostatistics and Epidemiology, Department of Biostatistics, University of Oslo, Oslo, Norway
- Centre for Genomic Pathogen Surveillance, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, United Kingdom
- Helsinki Institute of Information Technology, Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland
| |
Collapse
|
45
|
Shi X, Jiao Y, Yang Y, Cheng CY, Yang C, Lin X, Liu J. VIMCO: variational inference for multiple correlated outcomes in genome-wide association studies. Bioinformatics 2020; 35:3693-3700. [PMID: 30851102 DOI: 10.1093/bioinformatics/btz167] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2018] [Revised: 12/22/2018] [Accepted: 03/08/2019] [Indexed: 12/19/2022] Open
Abstract
MOTIVATION In genome-wide association studies (GWASs) where multiple correlated traits have been measured on participants, a joint analysis strategy, whereby the traits are analyzed jointly, can improve statistical power over a single-trait analysis strategy. There are two questions of interest to be addressed when conducting a joint GWAS analysis with multiple traits. The first question examines whether a genetic loci is significantly associated with any of the traits being tested. The second question focuses on identifying the specific trait(s) that is associated with the genetic loci. Since existing methods primarily focus on the first question, this article seeks to provide a complementary method that addresses the second question. RESULTS We propose a novel method, Variational Inference for Multiple Correlated Outcomes (VIMCO) that focuses on identifying the specific trait that is associated with the genetic loci, when performing a joint GWAS analysis of multiple traits, while accounting for correlation among the multiple traits. We performed extensive numerical studies and also applied VIMCO to analyze two datasets. The numerical studies and real data analysis demonstrate that VIMCO improves statistical power over single-trait analysis strategies when the multiple traits are correlated and has comparable performance when the traits are not correlated. AVAILABILITY AND IMPLEMENTATION The VIMCO software can be downloaded from: https://github.com/XingjieShi/VIMCO. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Xingjie Shi
- Department of Statistics, Nanjing University of Finance and Economics, Nanjing, China.,Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore
| | - Yuling Jiao
- School of Statistics and Mathematics, Zhongnan University of Economics and Law, Wuhan, China
| | - Yi Yang
- School of Statistics and Management, Shanghai University of Finance and Economics, Shanghai, China
| | - Ching-Yu Cheng
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore
| | - Can Yang
- Department of Mathematics, Hong Kong University of Science and Technology, Hong Kong
| | - Xinyi Lin
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore
| | - Jin Liu
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore
| |
Collapse
|
46
|
Beesley LJ, Salvatore M, Fritsche LG, Pandit A, Rao A, Brummett C, Willer CJ, Lisabeth LD, Mukherjee B. The emerging landscape of health research based on biobanks linked to electronic health records: Existing resources, statistical challenges, and potential opportunities. Stat Med 2020; 39:773-800. [PMID: 31859414 PMCID: PMC7983809 DOI: 10.1002/sim.8445] [Citation(s) in RCA: 48] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2018] [Revised: 09/10/2019] [Accepted: 11/16/2019] [Indexed: 01/03/2023]
Abstract
Biobanks linked to electronic health records provide rich resources for health-related research. With improvements in administrative and informatics infrastructure, the availability and utility of data from biobanks have dramatically increased. In this paper, we first aim to characterize the current landscape of available biobanks and to describe specific biobanks, including their place of origin, size, and data types. The development and accessibility of large-scale biorepositories provide the opportunity to accelerate agnostic searches, expedite discoveries, and conduct hypothesis-generating studies of disease-treatment, disease-exposure, and disease-gene associations. Rather than designing and implementing a single study focused on a few targeted hypotheses, researchers can potentially use biobanks' existing resources to answer an expanded selection of exploratory questions as quickly as they can analyze them. However, there are many obvious and subtle challenges with the design and analysis of biobank-based studies. Our second aim is to discuss statistical issues related to biobank research such as study design, sampling strategy, phenotype identification, and missing data. We focus our discussion on biobanks that are linked to electronic health records. Some of the analytic issues are illustrated using data from the Michigan Genomics Initiative and UK Biobank, two biobanks with two different recruitment mechanisms. We summarize the current body of literature for addressing these challenges and discuss some standing open problems. This work complements and extends recent reviews about biobank-based research and serves as a resource catalog with analytical and practical guidance for statisticians, epidemiologists, and other medical researchers pursuing research using biobanks.
Collapse
Affiliation(s)
| | | | | | - Anita Pandit
- University of Michigan, Department of Biostatistics
| | - Arvind Rao
- University of Michigan, Department of Computational Medicine and Bioinformatics
| | - Chad Brummett
- University of Michigan, Department of Anesthesiology
| | - Cristen J. Willer
- University of Michigan, Department of Computational Medicine and Bioinformatics
| | | | | |
Collapse
|
47
|
Sesia M, Katsevich E, Bates S, Candès E, Sabatti C. Multi-resolution localization of causal variants across the genome. Nat Commun 2020; 11:1093. [PMID: 32107378 PMCID: PMC7046731 DOI: 10.1038/s41467-020-14791-2] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2019] [Accepted: 02/01/2020] [Indexed: 01/07/2023] Open
Abstract
In the statistical analysis of genome-wide association data, it is challenging to precisely localize the variants that affect complex traits, due to linkage disequilibrium, and to maximize power while limiting spurious findings. Here we report on KnockoffZoom: a flexible method that localizes causal variants at multiple resolutions by testing the conditional associations of genetic segments of decreasing width, while provably controlling the false discovery rate. Our method utilizes artificial genotypes as negative controls and is equally valid for quantitative and binary phenotypes, without requiring any assumptions about their genetic architectures. Instead, we rely on well-established genetic models of linkage disequilibrium. We demonstrate that our method can detect more associations than mixed effects models and achieve fine-mapping precision, at comparable computational cost. Lastly, we apply KnockoffZoom to data from 350k subjects in the UK Biobank and report many new findings.
Collapse
Affiliation(s)
- Matteo Sesia
- Department of Statistics, Stanford University, Stanford, CA, 94305, USA
| | - Eugene Katsevich
- Department of Statistics, Carnegie Mellon University, Pittsburgh, PA, 15213, USA
| | - Stephen Bates
- Department of Statistics, Stanford University, Stanford, CA, 94305, USA
| | - Emmanuel Candès
- Departments of Mathematics and of Statistics, Stanford University, Stanford, CA, 94305, USA.
| | - Chiara Sabatti
- Departments of Biomedical Data Science and of Statistics, Stanford University, Stanford, CA, 94305, USA.
| |
Collapse
|
48
|
Potential of Genome-Wide Association Studies and Genomic Selection to Improve Productivity and Quality of Commercial Timber Species in Tropical Rainforest, a Case Study of Shorea platyclados. FORESTS 2020. [DOI: 10.3390/f11020239] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Shorea platyclados (Dark Red Meranti) is a commercially important timber tree species in Southeast Asia. However, its stocks have dramatically declined due, inter alia, to excessive logging, insufficient natural regeneration and a slow recovery rate. Thus, there is a need to promote enrichment planting and develop effective technique to support its rehabilitation and improve timber production through implementation of Genome-Wide Association Studies (GWAS) and Genomic Selection (GS). To assist such efforts, plant materials were collected from a half-sib progeny population in Sari Bumi Kusuma forest concession, Kalimantan, Indonesia. Using 5900 markers in sequences obtained from 356 individuals, we detected high linkage disequilibrium (LD) extending up to >145 kb, suggesting that associations between phenotypic traits and markers in LD can be more easily and feasibly detected with GWAS than with analysis of quantitative trait loci (QTLs). However, the detection power of GWAS seems low, since few single nucleotide polymorphisms linked to any focal traits were detected with a stringent false discovery rate, indicating that the species’ phenotypic traits are mostly under polygenic quantitative control. Furthermore, Machine Learning provided higher prediction accuracies than Bayesian methods. We also found that stem diameter, branch diameter ratio and wood density were more predictable than height, clear bole, branch angle and wood stiffness traits. Our study suggests that GS has potential for improving the productivity and quality of S. platyclados, and our genomic heritability estimates may improve the selection of traits to target in future breeding of this species.
Collapse
|
49
|
Renaux C, Buzdugan L, Kalisch M, Bühlmann P. Hierarchical inference for genome-wide association studies: a view on methodology with software. Comput Stat 2020. [DOI: 10.1007/s00180-019-00939-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
|
50
|
Becker GM, Davenport KM, Burke JM, Lewis RM, Miller JE, Morgan JLM, Notter DR, Murdoch BM. Genome-wide association study to identify genetic loci associated with gastrointestinal nematode resistance in Katahdin sheep. Anim Genet 2020; 51:330-335. [PMID: 31900974 PMCID: PMC7064973 DOI: 10.1111/age.12895] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/28/2019] [Indexed: 12/11/2022]
Abstract
Resistance to gastrointestinal nematodes has previously been shown to be a moderately heritable trait in some breeds of sheep, but the mechanisms of resistance are not well understood. Selection for resistance currently relies upon faecal egg counts (FEC), blood packed cell volumes and FAMACHA visual indicator scores of anaemia. Identifying genomic markers associated with disease resistance would potentially improve the selection process and provide a more reliable means of classifying and understanding the biology behind resistant and susceptible sheep. A GWAS was conducted to identify possible genetic loci associated with resistance to Haemonchus contortus in Katahdin sheep. Forty animals were selected from the top and bottom 10% of estimated breeding values for FEC from a total pool of 641 sires and ram lambs. Samples were genotyped using Applied Biosystems™ Axiom™ Ovine Genotyping Array (50K) consisting of 51 572 SNPs. Following quality control, 46 268 SNPs were included in subsequent analyses. Analyses were conducted using a linear regression model in plink v1.90 and a single‐locus mixed model in snp and variation suite. Genome‐wide significance was determined by a Bonferroni correction for multiple testing. Using linear regression, loci on chromosomes 2, 3, 16, 23 and 24 were significantly associated at the genome level with FEC estimated breeding values, and we identified a region on chromosome 2 that was significant using both statistical analyses. We suggest a potential role for the gene DIS3L2 for gastrointestinal nematode resistance in Katahdin sheep, although further research is needed to validate these findings.
Collapse
Affiliation(s)
- G M Becker
- Department of Animal and Veterinary Science, University of Idaho, Moscow, ID, 83844, USA
| | - K M Davenport
- Department of Animal and Veterinary Science, University of Idaho, Moscow, ID, 83844, USA
| | - J M Burke
- USDA, ARS, Dale Bumpers Small Farms Research Center, Booneville, AR, 72927, USA
| | - R M Lewis
- Department of Animal Science, University of Nebraska-Lincoln, Lincoln, NE, 68583, USA
| | - J E Miller
- Department of Pathobiological Sciences, School of Veterinary Medicine, Louisiana State University, Baton Rouge, LA, 70803, USA
| | - J L M Morgan
- Katahdin Hair Sheep International, Fayetteville, AR, 72701, USA
| | - D R Notter
- Department of Animal and Poultry Sciences, Virginia Tech, Blacksburg, VA, 24061, USA
| | - B M Murdoch
- Department of Animal and Veterinary Science, University of Idaho, Moscow, ID, 83844, USA
| |
Collapse
|