1
|
Nonlinear sufficient dimension reduction for distribution-on-distribution regression. J MULTIVARIATE ANAL 2024; 202:105302. [PMID: 38525479 PMCID: PMC10956811 DOI: 10.1016/j.jmva.2024.105302] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/26/2024]
Abstract
We introduce a new approach to nonlinear sufficient dimension reduction in cases where both the predictor and the response are distributional data, modeled as members of a metric space. Our key step is to build universal kernels (cc-universal) on the metric spaces, which results in reproducing kernel Hilbert spaces for the predictor and response that are rich enough to characterize the conditional independence that determines sufficient dimension reduction. For univariate distributions, we construct the universal kernel using the Wasserstein distance, while for multivariate distributions, we resort to the sliced Wasserstein distance. The sliced Wasserstein distance ensures that the metric space possesses similar topological properties to the Wasserstein space, while also offering significant computation benefits. Numerical results based on synthetic data show that our method outperforms possible competing methods. The method is also applied to several data sets, including fertility and mortality data and Calgary temperature data.
Collapse
|
2
|
Genomic dissection of additive and non-additive genetic effects and genomic prediction in an open-pollinated family test of Japanese larch. BMC Genomics 2024; 25:11. [PMID: 38166605 PMCID: PMC10759612 DOI: 10.1186/s12864-023-09891-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Accepted: 12/11/2023] [Indexed: 01/05/2024] Open
Abstract
Genomic dissection of genetic effects on desirable traits and the subsequent use of genomic selection hold great promise for accelerating the rate of genetic improvement of forest tree species. In this study, a total of 661 offspring trees from 66 open-pollinated families of Japanese larch (Larix kaempferi (Lam.) Carrière) were sampled at a test site. The contributions of additive and non-additive effects (dominance, imprinting and epistasis) were evaluated for nine valuable traits related to growth, wood physical and chemical properties, and competitive ability using three pedigree-based and four Genomics-based Best Linear Unbiased Predictions (GBLUP) models and used to determine the genetic model. The predictive ability (PA) of two genomic prediction methods, GBLUP and Reproducing Kernel Hilbert Spaces (RKHS), was compared. The traits could be classified into two types based on different quantitative genetic architectures: for type I, including wood chemical properties and Pilodyn penetration, additive effect is the main source of variation (38.20-67.46%); for type II, including growth, competitive ability and acoustic velocity, epistasis plays a significant role (50.76-91.26%). Dominance and imprinting showed low to moderate contributions (< 36.26%). GBLUP was more suitable for traits of type I (PAs = 0.37-0.39 vs. 0.14-0.25), and RKHS was more suitable for traits of type II (PAs = 0.23-0.37 vs. 0.07-0.23). Non-additive effects make no meaningful contribution to the enhancement of PA of GBLUP method for all traits. These findings enhance our current understanding of the architecture of quantitative traits and lay the foundation for the development of genomic selection strategies in Japanese larch.
Collapse
|
3
|
Value iteration for streaming data on a continuous space with gradient method in an RKHS. Neural Netw 2023; 166:437-445. [PMID: 37566954 DOI: 10.1016/j.neunet.2023.07.036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2022] [Revised: 07/06/2023] [Accepted: 07/26/2023] [Indexed: 08/13/2023]
Abstract
The classical theory of reinforcement learning focused on the tabular setting when states and actions are finite, or for linear representation of the value function in a finite-dimensional approximation. Establishing theory on general continuous state and action space requires a careful treatment of complexity theory of appropriately chosen function spaces and the iterative update of the value function when stochastic gradient descent (SGD) is used. For the classical prediction problem in reinforcement learning based on i.i.d. streaming data in the framework of reproducing kernel Hilbert spaces, we establish polynomial sample complexity taking into account the smoothness of the value function. In particular, we prove that the gradient descent algorithm efficiently computes the value function with appropriately chosen step sizes, with a convergence rate that can be close to 1/N, which is the best possible rate for parametric SGD. The advantages of using the gradient descent algorithm include its computational convenience and it can naturally deal with streaming data.
Collapse
|
4
|
Implementation of supervised principal component analysis for global sensitivity analysis of models with correlated inputs. STOCHASTIC ENVIRONMENTAL RESEARCH AND RISK ASSESSMENT : RESEARCH JOURNAL 2022; 36:2789-2818. [PMID: 35095342 PMCID: PMC8787458 DOI: 10.1007/s00477-021-02158-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Accepted: 12/12/2021] [Indexed: 06/14/2023]
Abstract
Global Sensitivity Analysis (GSA) plays a significant role in quantifying the tangible impact of model inputs on the uncertainty of response variable. As GSA results are strongly affected by correlated inputs, several studies have considered this issue, but most of them are computationally expensive, labor-intensive, and difficult to implement. Accordingly, this paper puts forward a novel regression-based strategy based on the Supervised Principal Component Analysis (SPCA), benefiting from the Reproducing Kernel Hilbert Space. Indeed, by conducting one kind of variance-based sensitivity analysis, a renowned method exclusively customized for models with orthogonal inputs, on SPCA regression, the impact of the correlation structure of input variables is considered. The ability of the suggested technique is evaluated with five test cases as well as three hydrologic and hydraulic models, and the results are compared with those obtained from the correlation ratio method; Taken as a benchmark solution, which is a robust but quite complicated approach in terms of programming. It is found that the proposed method satisfactorily identifies the sensitivity ordering of model inputs. Furthermore, it is proved in this study that the performance of the proposed approach is also supported by the total contribution index in the derived covariance decomposition equation. Moreover, the proposed method compared with the correlation ratio method, is found to be computationally efficient and easy to implement. Overall, the proposed scheme is appropriate for high dimensional, quite strong nonlinear or expensive models with correlated inputs, whose coefficient of determination between the original model and regression-based SPCA model is larger than 0.33.
Collapse
|
5
|
Predictive assessment of single-step BLUP with linear and non-linear similarity RKHS kernels: A case study in chickens. J Anim Breed Genet 2021; 139:247-258. [PMID: 34931377 DOI: 10.1111/jbg.12665] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2021] [Revised: 08/27/2021] [Accepted: 12/05/2021] [Indexed: 11/26/2022]
Abstract
Single-step GBLUP (ssGBLUP) to obtain genomic prediction was proposed in 2009. Many studies have investigated ssGBLUP in genomic selection in animals and plants using a standard linear kernel (similarity matrix) called genomic relationship matrix (G). More general kernels should allow capturing non-additive effects as well, whereas GBLUP is based on additive gene action. In this study, we generalized ssBLUP to accommodate two non-linear kernels, the averaged Gaussian kernel (AK) and the recently developed arc-cosine deep kernel (DK). We evaluated the methodology using body weight (BW) and hen-housing production (HHP) traits, recorded on a sample of phenotyped and genotyped commercial broiler chickens. There were, thus, different ssGBLUP models corresponding to G, AK and DK. We used random replication of training (TRN) and testing (TST) layouts at different genotyping rates (20%, 40%, 60% and 80% of all birds) in three selective genotyping scenarios. The selections were genotyping the youngest individuals in the pedigree (YS), random genotyping (RS) and genotyping based on parent average (PA). Predictive abilities were measured using rank correlations between the observed and the predictive phenotypic values in TST for each random partition. Prediction accuracy was influenced by the type of kernel when a large proportion of birds was genotyped. An advantage of non-linear kernels (AK and DK) was more apparent when 60 and 80% of birds had been genotyped. For BW, the lowest rank correlations were obtained with G (0.093 ± 0.015 using RS by 20% genotyped individuals) and the highest values with DK (0.320 ± 0.016 in the PA setting with 80% genotyped individuals). For HHP, the lowest and highest rank correlations were obtained by AK with 20% and 80% genotyped individuals, 0.071 ± 0.016 (in RS) and 0.23 ± 0.016 (in PA) respectively. Our results indicated that AK and DK are more effective than G when a large proportion of the target population is genotyped. Our expectation is that ssGBLUP with AK or DK models can perform even better than G when non-additive genetic effects influence the underlying variability of complex traits.
Collapse
|
6
|
Comparative Analytical Study of SCMA Detection Methods for PA Nonlinearity Mitigation. SENSORS 2021; 21:s21248408. [PMID: 34960511 PMCID: PMC8706374 DOI: 10.3390/s21248408] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/09/2021] [Revised: 12/09/2021] [Accepted: 12/13/2021] [Indexed: 11/29/2022]
Abstract
Non-orthogonal multiple access (NOMA) has emerged as a promising technology that allows for multiplexing several users over limited time-frequency resources. Among existing NOMA methods, sparse code multiple access (SCMA) is especially attractive; not only for its coding gain using suitable codebook design methodologies, but also for the guarantee of optimal detection using message passing algorithm (MPA). Despite SCMA’s benefits, the bit error rate (BER) performance of SCMA systems is known to degrade due to nonlinear power amplifiers at the transmitter. To mitigate this degradation, two types of detectors have recently emerged, namely, the Bussgang-based approaches and the reproducing kernel Hilbert space (RKHS)-based approaches. This paper presents analytical results on the error-floor of the Bussgang-based MPA, and compares it with a universally optimal RKHS-based MPA using random Fourier features (RFF). Although the Bussgang-based MPA is computationally simpler, it attains a higher BER floor compared to its RKHS-based counterpart. This error floor and the BER’s performance gap are quantified analytically and validated via computer simulations.
Collapse
|
7
|
Genomic Prediction of Additive and Non-additive Effects Using Genetic Markers and Pedigrees. G3-GENES GENOMES GENETICS 2019; 9:2739-2748. [PMID: 31263059 PMCID: PMC6686920 DOI: 10.1534/g3.119.201004] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
The genetic merit of individuals can be estimated using models with dense markers and pedigree information. Early genomic models accounted only for additive effects. However, the prediction of non-additive effects is important for different forest breeding systems where the whole genotypic value can be captured through clonal propagation. In this study, we evaluated the integration of marker data with pedigree information, in models that included or ignored non-additive effects. We tested the models Reproducing Kernel Hilbert Spaces (RKHS) and BayesA, with additive and additive-dominance frameworks. Model performance was assessed for the traits tree height, diameter at breast height and rust resistance, measured in 923 pine individuals from a structured population of 71 full-sib families. We have also simulated a population with similar genetic properties and evaluated the performance of models for six simulated traits with distinct genetic architectures. Different cross validation strategies were evaluated, and highest accuracies were achieved using within family cross validation. The inclusion of pedigree information in genomic prediction models did not yield higher accuracies. The different RKHS models resulted in similar predictions accuracies, and RKHS and BayesA generated substantially better predictions than pedigree-only models. The additive-BayesA resulted in higher accuracies than RKHS for rust incidence and in simulated additive-oligogenic traits. For DBH, HT and additive-dominance polygenic traits, the RKHS- based models showed slightly higher accuracies than BayesA. Our results indicate that BayesA performs the best for traits with few genes with major effects, while RKHS based models can best predict genotypic effects for clonal selection of complex traits.
Collapse
|
8
|
Bayesian analysis and prediction of hybrid performance. PLANT METHODS 2019; 15:14. [PMID: 30774704 PMCID: PMC6366084 DOI: 10.1186/s13007-019-0388-x] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/16/2018] [Accepted: 01/16/2019] [Indexed: 05/02/2023]
Abstract
BACKGROUND The selection of hybrids is an essential step in maize breeding. However, evaluating a large number of hybrids in field trials can be extremely costly. However, genomic models can be used to predict the expected performance of un-tested genotypes. Bayesian models offer a very flexible framework for hybrid prediction. The Bayesian methodology can be used with parametric and semi-parametric assumptions for additive and non-additive effects. Furthermore, samples from the posterior distribution of Bayesian models can be used to estimate the variance due to general and specific combining abilities even in cases where additive and non-additive effects are not mutually orthogonal. Also, the use of Bayesian models for analysis and prediction of hybrid performance has remained fairly limited. RESULTS We provided an overview of Bayesian parametric and semi-parametric genomic models for prediction of agronomic traits in maize hybrids and discussed how these models can be used to decompose the genotypic variance into components due to general and specific combining ability. We applied the methodology to data from 906 single cross tropical maize hybrids derived from a convergent population. Our results show that: (1) non-additive effects make a sizable contribution to the genetic variance of grain yield; however, the relative importance of non-additive effects was much smaller for ear and plant height; (2) genomic prediction can achieve relatively high accuracy in predicting phenotypes of un-tested hybrids and in pre-screening. CONCLUSIONS Genomic prediction can be a useful tool in pre-screening of hybrids and could contribute to the improvement of the efficiency and efficacy of maize hybrids breeding programs. The Bayesian framework offers a great deal of flexibility in modeling hybrid performance. The methodology can be used to estimate important genetic parameters and render predictions of the expected hybrid performance as well measures of uncertainty about such predictions.
Collapse
|
9
|
Accuracy of genomic selection for growth and wood quality traits in two control-pollinated progeny trials using exome capture as the genotyping platform in Norway spruce. BMC Genomics 2018; 19:946. [PMID: 30563448 DOI: 10.1186/s12864-12018-15256-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2018] [Accepted: 11/16/2018] [Indexed: 05/28/2023] Open
Abstract
BACKGROUND Genomic selection (GS) can increase genetic gain by reducing the length of breeding cycle in forest trees. Here we genotyped 1370 control-pollinated progeny trees from 128 full-sib families in Norway spruce (Picea abies (L.) Karst.), using exome capture as genotyping platform. We used 116,765 high-quality SNPs to develop genomic prediction models for tree height and wood quality traits. We assessed the impact of different genomic prediction methods, genotype-by-environment interaction (G × E), genetic composition, size of the training and validation set, relatedness, and number of SNPs on accuracy and predictive ability (PA) of GS. RESULTS Using G matrix slightly altered heritability estimates relative to pedigree-based method. GS accuracies were about 11-14% lower than those based on pedigree-based selection. The efficiency of GS per year varied from 1.71 to 1.78, compared to that of the pedigree-based model if breeding cycle length was halved using GS. Height GS accuracy decreased to more than 30% while using one site as training for GS prediction and using this model to predict the second site, indicating that G × E for tree height should be accommodated in model fitting. Using a half-sib family structure instead of full-sib structure led to a significant reduction in GS accuracy and PA. The full-sib family structure needed only 750 markers to reach similar accuracy and PA, as compared to 100,000 markers required for the half-sib family, indicating that maintaining the high relatedness in the model improves accuracy and PA. Using 4000-8000 markers in full-sib family structure was sufficient to obtain GS model accuracy and PA for tree height and wood quality traits, almost equivalent to that obtained with all markers. CONCLUSIONS The study indicates that GS would be efficient in reducing generation time of breeding cycle in conifer tree breeding program that requires long-term progeny testing. The sufficient number of trees within-family (16 for growth and 12 for wood quality traits) and number of SNPs (8000) are required for GS with full-sib family relationship. GS methods had little impact on GS efficiency for growth and wood quality traits. GS model should incorporate G × E effect when a strong G × E is detected.
Collapse
|
10
|
Accuracy of genomic selection for growth and wood quality traits in two control-pollinated progeny trials using exome capture as the genotyping platform in Norway spruce. BMC Genomics 2018; 19:946. [PMID: 30563448 PMCID: PMC6299659 DOI: 10.1186/s12864-018-5256-y] [Citation(s) in RCA: 40] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2018] [Accepted: 11/16/2018] [Indexed: 11/24/2022] Open
Abstract
BACKGROUND Genomic selection (GS) can increase genetic gain by reducing the length of breeding cycle in forest trees. Here we genotyped 1370 control-pollinated progeny trees from 128 full-sib families in Norway spruce (Picea abies (L.) Karst.), using exome capture as genotyping platform. We used 116,765 high-quality SNPs to develop genomic prediction models for tree height and wood quality traits. We assessed the impact of different genomic prediction methods, genotype-by-environment interaction (G × E), genetic composition, size of the training and validation set, relatedness, and number of SNPs on accuracy and predictive ability (PA) of GS. RESULTS Using G matrix slightly altered heritability estimates relative to pedigree-based method. GS accuracies were about 11-14% lower than those based on pedigree-based selection. The efficiency of GS per year varied from 1.71 to 1.78, compared to that of the pedigree-based model if breeding cycle length was halved using GS. Height GS accuracy decreased to more than 30% while using one site as training for GS prediction and using this model to predict the second site, indicating that G × E for tree height should be accommodated in model fitting. Using a half-sib family structure instead of full-sib structure led to a significant reduction in GS accuracy and PA. The full-sib family structure needed only 750 markers to reach similar accuracy and PA, as compared to 100,000 markers required for the half-sib family, indicating that maintaining the high relatedness in the model improves accuracy and PA. Using 4000-8000 markers in full-sib family structure was sufficient to obtain GS model accuracy and PA for tree height and wood quality traits, almost equivalent to that obtained with all markers. CONCLUSIONS The study indicates that GS would be efficient in reducing generation time of breeding cycle in conifer tree breeding program that requires long-term progeny testing. The sufficient number of trees within-family (16 for growth and 12 for wood quality traits) and number of SNPs (8000) are required for GS with full-sib family relationship. GS methods had little impact on GS efficiency for growth and wood quality traits. GS model should incorporate G × E effect when a strong G × E is detected.
Collapse
|
11
|
Abstract
Personalized medicine has received increasing attention among statisticians, computer scientists, and clinical practitioners. A major component of personalized medicine is the estimation of individualized treatment rules (ITRs). Recently, Zhao et al. (2012) proposed outcome weighted learning (OWL) to construct ITRs that directly optimize the clinical outcome. Although OWL opens the door to introducing machine learning techniques to optimal treatment regimes, it still has some problems in performance. (1) The estimated ITR of OWL is affected by a simple shift of the outcome. (2) The rule from OWL tries to keep treatment assignments that subjects actually received. (3) There is no variable selection mechanism with OWL. All of them weaken the finite sample performance of OWL. In this article, we propose a general framework, called Residual Weighted Learning (RWL), to alleviate these problems, and hence to improve finite sample performance. Unlike OWL which weights misclassification errors by clinical outcomes, RWL weights these errors by residuals of the outcome from a regression fit on clinical covariates excluding treatment assignment. We utilize the smoothed ramp loss function in RWL, and provide a difference of convex (d.c.) algorithm to solve the corresponding non-convex optimization problem. By estimating residuals with linear models or generalized linear models, RWL can effectively deal with different types of outcomes, such as continuous, binary and count outcomes. We also propose variable selection methods for linear and nonlinear rules, respectively, to further improve the performance. We show that the resulting estimator of the treatment rule is consistent. We further obtain a rate of convergence for the difference between the expected outcome using the estimated ITR and that of the optimal treatment rule. The performance of the proposed RWL methods is illustrated in simulation studies and in an analysis of cystic fibrosis clinical trial data.
Collapse
|
12
|
Large-scale signal detection: A unified perspective. Biometrics 2015; 72:325-34. [PMID: 26433744 DOI: 10.1111/biom.12423] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2014] [Revised: 07/01/2015] [Accepted: 08/01/2015] [Indexed: 12/01/2022]
Abstract
There is an overwhelmingly large literature and algorithms already available on "large-scale inference problems" based on different modeling techniques and cultures. Our primary goal in this article is not to add one more new methodology to the existing toolbox but instead (i) to clarify the mystery how these different simultaneous inference methods are connected, (ii) to provide an alternative more intuitive derivation of the formulas that leads to simpler expressions in order (iii) to develop a unified algorithm for practitioners. A detailed discussion on representation, estimation, inference, and model selection is given. Applications to a variety of real and simulated datasets show promise. We end with several future research directions.
Collapse
|
13
|
Dimensionality reduction of RKHS model parameters. ISA TRANSACTIONS 2015; 57:205-210. [PMID: 25765957 DOI: 10.1016/j.isatra.2015.02.003] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/16/2014] [Revised: 01/30/2015] [Accepted: 02/04/2015] [Indexed: 06/04/2023]
Abstract
This paper proposes a new method to reduce the parameter number of models developed in the Reproducing Kernel Hilbert Space (RKHS). In fact, this number is equal to the number of observations used in the learning phase which is assumed to be high. The proposed method entitled Reduced Kernel Partial Least Square (RKPLS) consists on approximating the retained latent components determined using the Kernel Partial Least Square (KPLS) method by their closest observation vectors. The paper proposes the design and the comparative study of the proposed RKPLS method and the Support Vector Machines on Regression (SVR) technique. The proposed method is applied to identify a nonlinear Process Trainer PT326 which is a physical process available in our laboratory. Moreover as a thermal process with large time response may help record easily effective observations which contribute to model identification. Compared to the SVR technique, the results from the proposed RKPLS method are satisfactory.
Collapse
|
14
|
Abstract
We consider model selection and estimation for partial spline models and propose a new regularization method in the context of smoothing splines. The regularization method has a simple yet elegant form, consisting of roughness penalty on the nonparametric component and shrinkage penalty on the parametric components, which can achieve function smoothing and sparse estimation simultaneously. We establish the convergence rate and oracle properties of the estimator under weak regularity conditions. Remarkably, the estimated parametric components are sparse and efficient, and the nonparametric component can be estimated with the optimal rate. The procedure also has attractive computational properties. Using the representer theory of smoothing splines, we reformulate the objective function as a LASSO-type problem, enabling us to use the LARS algorithm to compute the solution path. We then extend the procedure to situations when the number of predictors increases with the sample size and investigate its asymptotic properties in that context. Finite-sample performance is illustrated by simulations.
Collapse
|
15
|
Channel identification machines for multidimensional receptive fields. Front Comput Neurosci 2014; 8:117. [PMID: 25309413 PMCID: PMC4176398 DOI: 10.3389/fncom.2014.00117] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2014] [Accepted: 08/31/2014] [Indexed: 12/04/2022] Open
Abstract
We present algorithms for identifying multidimensional receptive fields directly from spike trains produced by biophysically-grounded neuron models. We demonstrate that only the projection of a receptive field onto the input stimulus space may be perfectly identified and derive conditions under which this identification is possible. We also provide detailed examples of identification of neural circuits incorporating spatiotemporal and spectrotemporal receptive fields.
Collapse
|
16
|
Genomic prediction in maize breeding populations with genotyping-by-sequencing. G3-GENES GENOMES GENETICS 2013; 3:1903-26. [PMID: 24022750 PMCID: PMC3815055 DOI: 10.1534/g3.113.008227] [Citation(s) in RCA: 176] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
Genotyping-by-sequencing (GBS) technologies have proven capacity for delivering large numbers of marker genotypes with potentially less ascertainment bias than standard single nucleotide polymorphism (SNP) arrays. Therefore, GBS has become an attractive alternative technology for genomic selection. However, the use of GBS data poses important challenges, and the accuracy of genomic prediction using GBS is currently undergoing investigation in several crops, including maize, wheat, and cassava. The main objective of this study was to evaluate various methods for incorporating GBS information and compare them with pedigree models for predicting genetic values of lines from two maize populations evaluated for different traits measured in different environments (experiments 1 and 2). Given that GBS data come with a large percentage of uncalled genotypes, we evaluated methods using nonimputed, imputed, and GBS-inferred haplotypes of different lengths (short or long). GBS and pedigree data were incorporated into statistical models using either the genomic best linear unbiased predictors (GBLUP) or the reproducing kernel Hilbert spaces (RKHS) regressions, and prediction accuracy was quantified using cross-validation methods. The following results were found: relative to pedigree or marker-only models, there were consistent gains in prediction accuracy by combining pedigree and GBS data; there was increased predictive ability when using imputed or nonimputed GBS data over inferred haplotype in experiment 1, or nonimputed GBS and information-based imputed short and long haplotypes, as compared to the other methods in experiment 2; the level of prediction accuracy achieved using GBS data in experiment 2 is comparable to those reported by previous authors who analyzed this data set using SNP arrays; and GBLUP and RKHS models with pedigree with nonimputed and imputed GBS data provided the best prediction correlations for the three traits in experiment 1, whereas for experiment 2 RKHS provided slightly better prediction than GBLUP for drought-stressed environments, and both models provided similar predictions in well-watered environments.
Collapse
|
17
|
Abstract
There is increasing interest in discovering individualized treatment rules for patients who have heterogeneous responses to treatment. In particular, one aims to find an optimal individualized treatment rule which is a deterministic function of patient specific characteristics maximizing expected clinical outcome. In this paper, we first show that estimating such an optimal treatment rule is equivalent to a classification problem where each subject is weighted proportional to his or her clinical outcome. We then propose an outcome weighted learning approach based on the support vector machine framework. We show that the resulting estimator of the treatment rule is consistent. We further obtain a finite sample bound for the difference between the expected outcome using the estimated individualized treatment rule and that of the optimal treatment rule. The performance of the proposed approach is demonstrated via simulation studies and an analysis of chronic depression data.
Collapse
|