1
|
Li HF, Wang JT, Zhao Q, Zhang YM. BLUPmrMLM: A Fast mrMLM Algorithm in Genome-wide Association Studies. GENOMICS, PROTEOMICS & BIOINFORMATICS 2024; 22:qzae020. [PMID: 39348630 PMCID: PMC12016565 DOI: 10.1093/gpbjnl/qzae020] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Revised: 12/13/2023] [Accepted: 01/10/2024] [Indexed: 10/02/2024]
Abstract
Multilocus genome-wide association study has become the state-of-the-art tool for dissecting the genetic architecture of complex and multiomic traits. However, most existing multilocus methods require relatively long computational time when analyzing large datasets. To address this issue, in this study, we proposed a fast mrMLM method, namely, best linear unbiased prediction multilocus random-SNP-effect mixed linear model (BLUPmrMLM). First, genome-wide single-marker scanning in mrMLM was replaced by vectorized Wald tests based on the best linear unbiased prediction (BLUP) values of marker effects and their variances in BLUPmrMLM. Then, adaptive best subset selection (ABESS) was used to identify potentially associated markers on each chromosome to reduce computational time when estimating marker effects via empirical Bayes. Finally, shared memory and parallel computing schemes were used to reduce the computational time. In simulation studies, BLUPmrMLM outperformed GEMMA, EMMAX, mrMLM, and FarmCPU as well as the control method (BLUPmrMLM with ABESS removed), in terms of computational time, power, accuracy for estimating quantitative trait nucleotide positions and effects, false positive rate, false discovery rate, false negative rate, and F1 score. In the reanalysis of two large rice datasets, BLUPmrMLM significantly reduced the computational time and identified more previously reported genes, compared with the aforementioned methods. This study provides an excellent multilocus model method for the analysis of large-scale and multiomic datasets. The software mrMLM v5.1 is available at BioCode (https://ngdc.cncb.ac.cn/biocode/tool/BT007388) or GitHub (https://github.com/YuanmingZhang65/mrMLM).
Collapse
Affiliation(s)
- Hong-Fu Li
- College of Plant Science and Technology, Huazhong Agricultural University, Wuhan 430070, China
| | - Jing-Tian Wang
- College of Plant Science and Technology, Huazhong Agricultural University, Wuhan 430070, China
| | - Qiong Zhao
- College of Plant Science and Technology, Huazhong Agricultural University, Wuhan 430070, China
| | - Yuan-Ming Zhang
- College of Plant Science and Technology, Huazhong Agricultural University, Wuhan 430070, China
| |
Collapse
|
2
|
Zhang J, Shen B, Zhou Z, Cai M, Wu X, Han L, Wen Y. An Extended Application of the Fast Multi-Locus Ridge Regression Algorithm in Genome-Wide Association Studies of Categorical Phenotypes. PLANTS (BASEL, SWITZERLAND) 2024; 13:2520. [PMID: 39274004 PMCID: PMC11397509 DOI: 10.3390/plants13172520] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/27/2024] [Revised: 09/02/2024] [Accepted: 09/05/2024] [Indexed: 09/16/2024]
Abstract
Categorical (either binary or ordinal) quantitative traits are widely observed to measure count and resistance in plants. Unlike continuous traits, categorical traits often provide less detailed insights into genetic variation and possess a more complex underlying genetic architecture, which presents additional challenges for their genome-wide association studies. Meanwhile, methods designed for binary or continuous phenotypes are commonly used to inappropriately analyze ordinal traits, which leads to the loss of original phenotype information and the detection power of quantitative trait nucleotides (QTN). To address these issues, fast multi-locus ridge regression (FastRR), which was originally designed for continuous traits, is used to directly analyze binary or ordinal traits in this study. FastRR includes three stages of continuous transformation, variable reduction, and parameter estimation, and it can computationally handle categorical phenotype data instead of link functions introduced or methods inappropriately used. A series of simulation studies demonstrate that, compared with four other continuous or binary or ordinal approaches, including logistic regression, FarmCPU, FaST-LMM, and POLMM, the FastRR method outperforms in the detection of small-effect QTN, accuracy of estimated effect, and computation speed. We applied FastRR to 14 binary or ordinal phenotypes in the Arabidopsis real dataset and identified 479 significant loci and 76 known genes, at least seven times as many as detected by other algorithms. These findings underscore the potential of FastRR as a very useful tool for genome-wide association studies and novel gene mining of binary and ordinal traits.
Collapse
Affiliation(s)
- Jin Zhang
- College of Science, Nanjing Agricultural University, Nanjing 210095, China
| | - Bolin Shen
- College of Science, Nanjing Agricultural University, Nanjing 210095, China
| | - Ziyang Zhou
- College of Science, Nanjing Agricultural University, Nanjing 210095, China
| | - Mingzhi Cai
- College of Science, Nanjing Agricultural University, Nanjing 210095, China
| | - Xinyi Wu
- College of Science, Nanjing Agricultural University, Nanjing 210095, China
| | - Le Han
- College of Science, Nanjing Agricultural University, Nanjing 210095, China
| | - Yangjun Wen
- College of Science, Nanjing Agricultural University, Nanjing 210095, China
| |
Collapse
|
3
|
Guo H, Li T, Shi Y, Wang X. MTML: An Efficient Multitrait Multilocus GWAS Method Based on the Cauchy Combination Test. Biom J 2024; 66:e202300130. [PMID: 39076046 DOI: 10.1002/bimj.202300130] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Revised: 10/19/2023] [Accepted: 11/27/2023] [Indexed: 07/31/2024]
Abstract
Genome-wide association study (GWAS) by measuring the joint effect of multiple loci on multiple traits, has recently attracted interest, due to the decreased costs of high-throughput genotyping and phenotyping technologies. Previous studies mainly focused on either multilocus models that identify associations with a single trait or multitrait models that scan a single marker at a time. Since these types of models cannot fully utilize the association information, the powers of the tests are usually low. To potentially address this problem, we present here a multitrait multilocus (MTML) modeling framework that implements in three steps: (1) simplify the complex calculation; (2) reduce the model dimension; (3) integrate the joint contribution of single markers to multiple traits by Cauchy combination. The performances of MTML are evaluated and compared with other three published methods by Monte Carlo simulations. Simulation results show that MTML is more powerful for quantitative trait nucleotide detection and robust for various numbers of traits. In the meanwhile, MTML can effectively control type I error rate at a reasonable level. Real data analysis of Arabidopsis thaliana shows that MTML identifies more pleiotropic genetic associations. Therefore, we conclude that MTML is an efficient GWAS method for joint analysis of multiple quantitative traits. The R package MTML, which facilitates the implementation of the proposed method, is publicly available on GitHub https://github.com/Guohongping/MTML.
Collapse
Affiliation(s)
- Hongping Guo
- School of Mathematics and Statistics, Hubei Normal University, Huangshi, China
| | - Tong Li
- School of Mathematics and Statistics, Hubei Normal University, Huangshi, China
| | - Yao Shi
- School of Mathematics and Statistics, Qingdao University, Qingdao, China
| | - Xiao Wang
- School of Mathematics and Statistics, Qingdao University, Qingdao, China
| |
Collapse
|
4
|
Xu C, Zhang R, Duan M, Zhou Y, Bao J, Lu H, Wang J, Hu M, Hu Z, Zhou F, Zhu W. A polygenic stacking classifier revealed the complicated platelet transcriptomic landscape of adult immune thrombocytopenia. MOLECULAR THERAPY - NUCLEIC ACIDS 2022; 28:477-487. [PMID: 35505964 PMCID: PMC9046129 DOI: 10.1016/j.omtn.2022.04.004] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/23/2021] [Accepted: 04/01/2022] [Indexed: 01/19/2023]
Abstract
Immune thrombocytopenia (ITP) is an autoimmune disease with the typical symptom of a low platelet count in blood. ITP demonstrated age and sex biases in both occurrences and prognosis, and adult ITP was mainly induced by the living environments. The current diagnosis guideline lacks the integration of molecular heterogenicity. This study recruited the largest cohort of platelet transcriptome samples. A comprehensive procedure of feature selection, feature engineering, and stacking classification was carried out to detect the ITP biomarkers using RNA sequencing (RNA-seq) transcriptomes. The 40 detected biomarkers were loaded to train the final ITP detection model, with an overall accuracy 0.974. The biomarkers suggested that ITP onset may be associated with various transcribed components, including protein-coding genes, long intergenic non-coding RNA (lincRNA) genes, and pseudogenes with apparent transcriptions. The delivered ITP detection model may also be utilized as a complementary ITP diagnosis tool. The code and the example dataset is freely available on http://www.healthinformaticslab.org/supp/resources.php
Collapse
Affiliation(s)
- Chengfeng Xu
- Department of Hematology, Yueyang Hospital of Integrated Traditional Chinese and Western Medicine, Shanghai University of Traditional Chinese Medicine, 110 Ganhe Road, Hongkou District, Shanghai 200437, China
| | - Ruochi Zhang
- College of Computer Science and Technology, Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China
| | - Meiyu Duan
- College of Computer Science and Technology, Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China
| | - Yongming Zhou
- Department of Hematology, Yueyang Hospital of Integrated Traditional Chinese and Western Medicine, Shanghai University of Traditional Chinese Medicine, 110 Ganhe Road, Hongkou District, Shanghai 200437, China
| | - Jizhang Bao
- Department of Hematology, Yueyang Hospital of Integrated Traditional Chinese and Western Medicine, Shanghai University of Traditional Chinese Medicine, 110 Ganhe Road, Hongkou District, Shanghai 200437, China
| | - Hao Lu
- Department of Hematology, Yueyang Hospital of Integrated Traditional Chinese and Western Medicine, Shanghai University of Traditional Chinese Medicine, 110 Ganhe Road, Hongkou District, Shanghai 200437, China
| | - Jie Wang
- Department of Hematology, Yueyang Hospital of Integrated Traditional Chinese and Western Medicine, Shanghai University of Traditional Chinese Medicine, 110 Ganhe Road, Hongkou District, Shanghai 200437, China
| | - Minghui Hu
- Department of Hematology, Yueyang Hospital of Integrated Traditional Chinese and Western Medicine, Shanghai University of Traditional Chinese Medicine, 110 Ganhe Road, Hongkou District, Shanghai 200437, China
| | - Zhaoyang Hu
- Fun-Med Pharmaceutical Technology (Shanghai) Co., Ltd., RM. A310, 115 Xinjunhuan Road, Minhang District, Shanghai 201100, China
- Corresponding author Zhaoyang Hu, PhD, Fengneng Pharmaceutical Technology (Shanghai) Co., Ltd., RM. A310, 115 Xinjunhuan Road, Minhang District, Shanghai 201100, China.
| | - Fengfeng Zhou
- College of Computer Science and Technology, Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China
- Corresponding author Fengfeng Zhou, PhD, College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China.
| | - Wenwei Zhu
- Department of Hematology, Yueyang Hospital of Integrated Traditional Chinese and Western Medicine, Shanghai University of Traditional Chinese Medicine, 110 Ganhe Road, Hongkou District, Shanghai 200437, China
- Corresponding author Wenwei Zhu, PhD, Department of Hematology, Yueyang Hospital of Integrated Traditional Chinese and Western Medicine, Shanghai University of Traditional Chinese Medicine, 110 Ganhe Road, Hongkou District, Shanghai 200437, China.
| |
Collapse
|
5
|
Wang M, Fang Z, Yoo B, Bejerano G, Peltz G. The Effect of Population Structure on Murine Genome-Wide Association Studies. Front Genet 2021; 12:745361. [PMID: 34589118 PMCID: PMC8475632 DOI: 10.3389/fgene.2021.745361] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2021] [Accepted: 08/25/2021] [Indexed: 12/14/2022] Open
Abstract
The ability to use genome-wide association studies (GWAS) for genetic discovery depends upon our ability to distinguish true causative from false positive association signals. Population structure (PS) has been shown to cause false positive signals in GWAS. PS correction is routinely used for analysis of human GWAS results, and it has been assumed that it also should be utilized for murine GWAS using inbred strains. Nevertheless, there are fundamental differences between murine and human GWAS, and the impact of PS on murine GWAS results has not been carefully investigated. To assess the impact of PS on murine GWAS, we examined 8223 datasets that characterized biomedical responses in panels of inbred mouse strains. Rather than treat PS as a confounding variable, we examined it as a response variable. Surprisingly, we found that PS had a minimal impact on datasets measuring responses in ≤20 strains; and had surprisingly little impact on most datasets characterizing 21 - 40 inbred strains. Moreover, we show that true positive association signals arising from haplotype blocks, SNPs or indels, which were experimentally demonstrated to be causative for trait differences, would be rejected if PS correction were applied to them. Our results indicate because of the special conditions created by GWAS (the use of inbred strains, small sample sizes) PS assessment results should be carefully evaluated in conjunction with other criteria, when murine GWAS results are evaluated.
Collapse
Affiliation(s)
- Meiyue Wang
- Department of Anesthesia, Stanford University School of Medicine, Stanford, CA, United States
| | - Zhuoqing Fang
- Department of Anesthesia, Stanford University School of Medicine, Stanford, CA, United States
| | - Boyoung Yoo
- Department of Computer Science, Stanford University School of Engineering, Stanford, CA, United States
| | - Gill Bejerano
- Department of Computer Science, Stanford University School of Engineering, Stanford, CA, United States.,Department of Developmental Biology, Stanford University School of Medicine, Stanford, CA, United States.,Department of Pediatrics, Stanford University School of Medicine, Stanford, CA, United States.,Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA, United States
| | - Gary Peltz
- Department of Anesthesia, Stanford University School of Medicine, Stanford, CA, United States
| |
Collapse
|
6
|
Misztal I, Aguilar I, Lourenco D, Ma L, Steibel JP, Toro M. Emerging issues in genomic selection. J Anim Sci 2021; 99:skab092. [PMID: 33773494 PMCID: PMC8186541 DOI: 10.1093/jas/skab092] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2021] [Accepted: 03/26/2021] [Indexed: 12/22/2022] Open
Abstract
Genomic selection (GS) is now practiced successfully across many species. However, many questions remain, such as long-term effects, estimations of genomic parameters, robustness of genome-wide association study (GWAS) with small and large datasets, and stability of genomic predictions. This study summarizes presentations from the authors at the 2020 American Society of Animal Science (ASAS) symposium. The focus of many studies until now is on linkage disequilibrium between two loci. Ignoring higher-level equilibrium may lead to phantom dominance and epistasis. The Bulmer effect leads to a reduction of the additive variance; however, the selection for increased recombination rate can release anew genetic variance. With genomic information, estimates of genetic parameters may be biased by genomic preselection, but costs of estimation can increase drastically due to the dense form of the genomic information. To make the computation of estimates feasible, genotypes could be retained only for the most important animals, and methods of estimation should use algorithms that can recognize dense blocks in sparse matrices. GWASs using small genomic datasets frequently find many marker-trait associations, whereas studies using much bigger datasets find only a few. Most of the current tools use very simple models for GWAS, possibly causing artifacts. These models are adequate for large datasets where pseudo-phenotypes such as deregressed proofs indirectly account for important effects for traits of interest. Artifacts arising in GWAS with small datasets can be minimized by using data from all animals (whether genotyped or not), realistic models, and methods that account for population structure. Recent developments permit the computation of P-values from genomic best linear unbiased prediction (GBLUP), where models can be arbitrarily complex but restricted to genotyped animals only, and single-step GBLUP that also uses phenotypes from ungenotyped animals. Stability was an important part of nongenomic evaluations, where genetic predictions were stable in the absence of new data even with low prediction accuracies. Unfortunately, genomic evaluations for such animals change because all animals with genotypes are connected. A top-ranked animal can easily drop in the next evaluation, causing a crisis of confidence in genomic evaluations. While correlations between consecutive genomic evaluations are high, outliers can have differences as high as 1 SD. A solution to fluctuating genomic evaluations is to base selection decisions on groups of animals. Although many issues in GS have been solved, many new issues that require additional research continue to surface.
Collapse
Affiliation(s)
- Ignacy Misztal
- Department of Animal and Dairy Science, University of Georgia, Athens, GA 30602, USA
| | - Ignacio Aguilar
- Instituto Nacional de Investigación Agropecuaria (INIA), 90200 Canelones, Uruguay
| | - Daniela Lourenco
- Department of Animal and Dairy Science, University of Georgia, Athens, GA 30602, USA
| | - Li Ma
- Department of Animal and Avian Sciences, University of Maryland, College Park, MD 20742, USA
| | - Juan Pedro Steibel
- Department of Animal Science, Michigan State University, East Lansing, MI 48824, USA
| | - Miguel Toro
- Departamento de Producción Agraria, Universidad Politécnica de Madrid, Madrid, Spain
| |
Collapse
|
7
|
Zhang J, Liu F, Reif JC, Jiang Y. On the use of GBLUP and its extension for GWAS with additive and epistatic effects. G3-GENES GENOMES GENETICS 2021; 11:6237487. [PMID: 33871030 PMCID: PMC8495923 DOI: 10.1093/g3journal/jkab122] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/15/2021] [Accepted: 04/04/2021] [Indexed: 11/29/2022]
Abstract
Genomic best linear unbiased prediction (GBLUP) is the most widely used model for genome-wide predictions. Interestingly, it is also possible to perform genome-wide association studies (GWAS) based on GBLUP. Although the estimated marker effects in GBLUP are shrunken and the conventional test based on such effects has low power, it was observed that a modified test statistic can be produced and the result of test was identical to a standard GWAS model. Later, a mathematical proof was given for the special case that there is no fixed covariate in GBLUP. Since then, the new approach has been called “GWAS by GBLUP”. Nevertheless, covariates such as environmental and subpopulation effects are very common in GBLUP. Thus, it is necessary to confirm the equivalence in the general case. Recently, the concept was generalized to GWAS for epistatic effects and the new approach was termed rapid epistatic mixed-model association analysis (REMMA) because it greatly improved the computational efficiency. However, the relationship between REMMA and the standard GWAS model has not been investigated. In this study, we first provided a general mathematical proof of the equivalence between “GWAS by GBLUP” and the standard GWAS model for additive effects. Then, we compared REMMA with the standard GWAS model for epistatic effects by a theoretical investigation and by empirical data analyses. We hypothesized that the similarity of the two models is influenced by the relative contribution of additive and epistatic effects to the phenotypic variance, which was verified by empirical and simulation studies.
Collapse
Affiliation(s)
- Jie Zhang
- Department of Breeding Research, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466 Stadt Seeland, Germany
| | - Fang Liu
- Department of Breeding Research, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466 Stadt Seeland, Germany
| | - Jochen C Reif
- Department of Breeding Research, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466 Stadt Seeland, Germany
| | - Yong Jiang
- Department of Breeding Research, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466 Stadt Seeland, Germany
| |
Collapse
|
8
|
Zhang J, Chen M, Wen Y, Zhang Y, Lu Y, Wang S, Chen J. A Fast Multi-Locus Ridge Regression Algorithm for High-Dimensional Genome-Wide Association Studies. Front Genet 2021; 12:649196. [PMID: 33854527 PMCID: PMC8041068 DOI: 10.3389/fgene.2021.649196] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2021] [Accepted: 03/01/2021] [Indexed: 11/13/2022] Open
Abstract
The mixed linear model (MLM) has been widely used in genome-wide association study (GWAS) to dissect quantitative traits in human, animal, and plant genetics. Most methodologies consider all single nucleotide polymorphism (SNP) effects as random effects under the MLM framework, which fail to detect the joint minor effect of multiple genetic markers on a trait. Therefore, polygenes with minor effects remain largely unexplored in today’s big data era. In this study, we developed a new algorithm under the MLM framework, which is called the fast multi-locus ridge regression (FastRR) algorithm. The FastRR algorithm first whitens the covariance matrix of the polygenic matrix K and environmental noise, then selects potentially related SNPs among large scale markers, which have a high correlation with the target trait, and finally analyzes the subset variables using a multi-locus deshrinking ridge regression for true quantitative trait nucleotide (QTN) detection. Results from the analyses of both simulated and real data show that the FastRR algorithm is more powerful for both large and small QTN detection, more accurate in QTN effect estimation, and has more stable results under various polygenic backgrounds. Moreover, compared with existing methods, the FastRR algorithm has the advantage of high computing speed. In conclusion, the FastRR algorithm provides an alternative algorithm for multi-locus GWAS in high dimensional genomic datasets.
Collapse
Affiliation(s)
- Jin Zhang
- College of Science, Nanjing Agricultural University, Nanjing, China.,Postdoctoral Research Station of Crop Science, Nanjing Agricultural University, Nanjing, China
| | - Min Chen
- College of Science, Nanjing Agricultural University, Nanjing, China
| | - Yangjun Wen
- College of Science, Nanjing Agricultural University, Nanjing, China
| | - Yin Zhang
- College of Science, Nanjing Agricultural University, Nanjing, China
| | - Yunan Lu
- College of Science, Nanjing Agricultural University, Nanjing, China
| | - Shengmeng Wang
- College of Science, Nanjing Agricultural University, Nanjing, China
| | - Juncong Chen
- College of Finance, Nanjing Agricultural University, Nanjing, China
| |
Collapse
|