1
|
Glasgow NG, Chen Y, Korngreen A, Kass RE, Urban NN. A biophysical and statistical modeling paradigm for connecting neural physiology and function. J Comput Neurosci 2023; 51:263-282. [PMID: 37140691 PMCID: PMC10182162 DOI: 10.1007/s10827-023-00847-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Revised: 02/10/2023] [Accepted: 02/16/2023] [Indexed: 05/05/2023]
Abstract
To understand single neuron computation, it is necessary to know how specific physiological parameters affect neural spiking patterns that emerge in response to specific stimuli. Here we present a computational pipeline combining biophysical and statistical models that provides a link between variation in functional ion channel expression and changes in single neuron stimulus encoding. More specifically, we create a mapping from biophysical model parameters to stimulus encoding statistical model parameters. Biophysical models provide mechanistic insight, whereas statistical models can identify associations between spiking patterns and the stimuli they encode. We used public biophysical models of two morphologically and functionally distinct projection neuron cell types: mitral cells (MCs) of the main olfactory bulb, and layer V cortical pyramidal cells (PCs). We first simulated sequences of action potentials according to certain stimuli while scaling individual ion channel conductances. We then fitted point process generalized linear models (PP-GLMs), and we constructed a mapping between the parameters in the two types of models. This framework lets us detect effects on stimulus encoding of changing an ion channel conductance. The computational pipeline combines models across scales and can be applied as a screen of channels, in any cell type of interest, to identify ways that channel properties influence single neuron computation.
Collapse
Affiliation(s)
- Nathan G Glasgow
- Department of Neurobiology and Center for Neuroscience, University of Pittsburgh, Pittsburgh, PA, USA
- Center for the Neural Basis of Cognition, Pittsburgh, PA, USA
| | - Yu Chen
- Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA, USA
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Alon Korngreen
- The Leslie and Susan Gonda Interdisciplinary Brain Research Centre, Bar-Ilan University, Ramat Gan, Israel
- The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat Gan, Israel
| | - Robert E Kass
- Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA, USA.
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, USA.
- Department of Statistics, Carnegie Mellon University, Pittsburgh, PA, USA.
| | - Nathan N Urban
- Department of Biological Sciences, Lehigh University, Bethlehem, PA, USA
| |
Collapse
|
2
|
Barmak R, Stefanec M, Hofstadler DN, Piotet L, Schönwetter-Fuchs-Schistek S, Mondada F, Schmickl T, Mills R. A robotic honeycomb for interaction with a honeybee colony. Sci Robot 2023; 8:eadd7385. [PMID: 36947600 DOI: 10.1126/scirobotics.add7385] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/24/2023]
Abstract
Robotic technologies have shown the capability to interact with living organisms and even to form integrated mixed societies composed of living and artificial agents. Biocompatible robots, incorporating sensing and actuation capable of generating and responding to relevant stimuli, can be a tool to study collective behaviors previously unattainable with traditional techniques. To investigate collective behaviors of the western honeybee (Apis mellifera), we designed a robotic system capable of observing and modulating the bee cluster using an array of thermal sensors and actuators. We initially integrated the system into a beehive populated with about 4000 bees for several months. The robotic system was able to observe the colony by continuously collecting spatiotemporal thermal profiles of the winter cluster. Furthermore, we found that our robotic device reliably modulated the superorganism's response to dynamic thermal stimulation, influencing its spatiotemporal reorganization. In addition, after identifying the thermal collapse of a colony, we used the robotic system in a "life-support" mode via its thermal actuators. Ultimately, we demonstrated a robotic device capable of autonomous closed-loop interaction with a cluster comprising thousands of individual bees. Such biohybrid societies open the door to investigation of collective behaviors that necessitate observing and interacting with the animals within a complete social context, as well as for potential applications in augmenting the survivability of these pollinators crucial to our ecosystems and our food supply.
Collapse
Affiliation(s)
- Rafael Barmak
- Mobile Robotic Systems Group, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Martin Stefanec
- Artificial Life Lab, Department of Zoology, Institute of Biology, University of Graz, Graz, Austria
| | - Daniel N Hofstadler
- Artificial Life Lab, Department of Zoology, Institute of Biology, University of Graz, Graz, Austria
| | - Louis Piotet
- Mobile Robotic Systems Group, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | | | - Francesco Mondada
- Mobile Robotic Systems Group, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Thomas Schmickl
- Artificial Life Lab, Department of Zoology, Institute of Biology, University of Graz, Graz, Austria
| | - Rob Mills
- Mobile Robotic Systems Group, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| |
Collapse
|
3
|
Heng Q, Zhou H, Chi EC. Bayesian Trend Filtering via Proximal Markov Chain Monte Carlo. J Comput Graph Stat 2023; 32:938-949. [PMID: 37822489 PMCID: PMC10564381 DOI: 10.1080/10618600.2023.2170089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2021] [Accepted: 01/09/2023] [Indexed: 01/21/2023]
Abstract
Proximal Markov Chain Monte Carlo is a novel construct that lies at the intersection of Bayesian computation and convex optimization, which helped popularize the use of nondifferentiable priors in Bayesian statistics. Existing formulations of proximal MCMC, however, require hyperparameters and regularization parameters to be prespecified. In this work, we extend the paradigm of proximal MCMC through introducing a novel new class of nondifferentiable priors called epigraph priors. As a proof of concept, we place trend filtering, which was originally a nonparametric regression problem, in a parametric setting to provide a posterior median fit along with credible intervals as measures of uncertainty. The key idea is to replace the nonsmooth term in the posterior density with its Moreau-Yosida envelope, which enables the application of the gradient-based MCMC sampler Hamiltonian Monte Carlo. The proposed method identifies the appropriate amount of smoothing in a data-driven way, thereby automating regularization parameter selection. Compared with conventional proximal MCMC methods, our method is mostly tuning free, achieving simultaneous calibration of the mean, scale and regularization parameters in a fully Bayesian framework.
Collapse
Affiliation(s)
- Qiang Heng
- Department of Statistics, North Carolina State University
| | - Hua Zhou
- Departments of Biostatistics and Computational Medicine, UCLA
| | | |
Collapse
|
4
|
Liu X, Yeo K. Inverse Models for Estimating the Initial Condition of Spatio-Temporal Advection-Diffusion Processes. Technometrics 2023. [DOI: 10.1080/00401706.2023.2181222] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/18/2023]
Affiliation(s)
- Xiao Liu
- Department of Industrial Engineering, University of Arkansas
| | | |
Collapse
|
5
|
Fu A, Taasti VT, Zarepisheh M. Distributed and scalable optimization for robust proton treatment planning. Med Phys 2023; 50:633-642. [PMID: 35907245 PMCID: PMC10249339 DOI: 10.1002/mp.15897] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2022] [Revised: 06/29/2022] [Accepted: 07/09/2022] [Indexed: 01/25/2023] Open
Abstract
BACKGROUND The importance of robust proton treatment planning to mitigate the impact of uncertainty is well understood. However, its computational cost grows with the number of uncertainty scenarios, prolonging the treatment planning process. PURPOSE We developed a fast and scalable distributed optimization platform that parallelizes the robust proton treatment plan computation over the uncertainty scenarios. METHODS We modeled the robust proton treatment planning problem as a weighted least-squares problem. To solve it, we employed an optimization technique called the alternating direction method of multipliers with Barzilai-Borwein step size (ADMM-BB). We reformulated the problem in such a way as to split the main problem into smaller subproblems, one for each proton therapy uncertainty scenario. The subproblems can be solved in parallel, allowing the computational load to be distributed across multiple processors (e.g., CPU threads/cores). We evaluated ADMM-BB on four head-and-neck proton therapy patients, each with 13 scenarios accounting for 3 mm setup and 3.5% range uncertainties. We then compared the performance of ADMM-BB with projected gradient descent (PGD) applied to the same problem. RESULTS For each patient, ADMM-BB generated a robust proton treatment plan that satisfied all clinical criteria with comparable or better dosimetric quality than the plan generated by PGD. However, ADMM-BB's total runtime averaged about 6 to 7 times faster. This speedup increased with the number of scenarios. CONCLUSIONS ADMM-BB is a powerful distributed optimization method that leverages parallel processing platforms, such as multicore CPUs, GPUs, and cloud servers, to accelerate the computationally intensive work of robust proton treatment planning. This results in (1) a shorter treatment planning process and (2) the ability to consider more uncertainty scenarios, which improves plan quality.
Collapse
Affiliation(s)
- Anqi Fu
- Department of Medical Physics, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Vicki T. Taasti
- Department of Radiation Oncology, Maastricht University Medical Center, Maastricht, NL
| | - Masoud Zarepisheh
- Department of Medical Physics, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| |
Collapse
|
6
|
Ko S, Zhou H, Zhou JJ, Won JH. High-Performance Statistical Computing in the Computing Environments of the 2020s. Stat Sci 2022; 37:494-518. [PMID: 37168541 PMCID: PMC10168006 DOI: 10.1214/21-sts835] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
Technological advances in the past decade, hardware and software alike, have made access to high-performance computing (HPC) easier than ever. We review these advances from a statistical computing perspective. Cloud computing makes access to supercomputers affordable. Deep learning software libraries make programming statistical algorithms easy and enable users to write code once and run it anywhere-from a laptop to a workstation with multiple graphics processing units (GPUs) or a supercomputer in a cloud. Highlighting how these developments benefit statisticians, we review recent optimization algorithms that are useful for high-dimensional models and can harness the power of HPC. Code snippets are provided to demonstrate the ease of programming. We also provide an easy-to-use distributed matrix data structure suitable for HPC. Employing this data structure, we illustrate various statistical applications including large-scale positron emission tomography and ℓ1-regularized Cox regression. Our examples easily scale up to an 8-GPU workstation and a 720-CPU-core cluster in a cloud. As a case in point, we analyze the onset of type-2 diabetes from the UK Biobank with 200,000 subjects and about 500,000 single nucleotide polymorphisms using the HPC ℓ1-regularized Cox regression. Fitting this half-million-variate model takes less than 45 minutes and reconfirms known associations. To our knowledge, this is the first demonstration of the feasibility of penalized regression of survival outcomes at this scale.
Collapse
Affiliation(s)
- Seyoon Ko
- Department of Biostatistics, UCLA Fielding School of Public Health, Los Angeles, California 90095, USA
| | - Hua Zhou
- Department of Biostatistics, UCLA Fielding School of Public Health, Los Angeles, California 90095, USA
| | - Jin J Zhou
- Department of Medicine, UCLA David Geffen School of Medicine, Los Angeles, California 90095, USA, and Department of Epidemiology and Biostatistics, Mel and Enid Zuckerman College of Public Health, University of Arizona, Tucson, Arizona 85724, USA
| | - Joong-Ho Won
- Department of Statistics, Seoul National University, Seoul, Korea
| |
Collapse
|
7
|
Chen Y, Jewell S, Witten D. More Powerful Selective Inference for the Graph Fused Lasso. J Comput Graph Stat 2022; 32:577-587. [PMID: 38250478 PMCID: PMC10798806 DOI: 10.1080/10618600.2022.2097246] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2021] [Accepted: 06/28/2022] [Indexed: 10/17/2022]
Abstract
The graph fused lasso-which includes as a special case the one-dimensional fused lasso-is widely used to reconstruct signals that are piecewise constant on a graph, meaning that nodes connected by an edge tend to have identical values. We consider testing for a difference in the means of two connected components estimated using the graph fused lasso. A naive procedure such as a z-test for a difference in means will not control the selective Type I error, since the hypothesis that we are testing is itself a function of the data. In this work, we propose a new test for this task that controls the selective Type I error, and conditions on less information than existing approaches, leading to substantially higher power. We illustrate our approach in simulation and on datasets of drug overdose death rates and teenage birth rates in the contiguous United States. Our approach yields more discoveries on both datasets. Supplementary materials for this article are available online.
Collapse
Affiliation(s)
- Yiqun Chen
- Department of Biostatistics, University of Washington, Seattle, WA
| | - Sean Jewell
- Department of Statistics, University of Washington, Seattle, WA
| | - Daniela Witten
- Department of Biostatistics, University of Washington, Seattle, WA
- Department of Statistics, University of Washington, Seattle, WA
| |
Collapse
|
8
|
Raiho AM, Paciorek CJ, Dawson A, Jackson ST, Mladenoff DJ, Williams JW, McLachlan JS. 8000-year doubling of Midwestern forest biomass driven by population- and biome-scale processes. Science 2022. [DOI: 10.1126/science.abk3126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
Changes in woody biomass over centuries to millennia are poorly known, leaving unclear the magnitude of terrestrial carbon fluxes before industrial-era disturbance. Here, we statistically reconstructed changes in woody biomass across the upper Midwestern region of the United States over the past 10,000 years using a Bayesian model calibrated to preindustrial forest biomass estimates and fossil pollen records. After an initial postglacial decline, woody biomass nearly doubled during the past 8000 years, sequestering 1800 teragrams. This steady accumulation of carbon was driven by two separate ecological responses to regionally changing climate: the spread of forested biomes and the population expansion of high-biomass tree species within forests. What took millennia to accumulate took less than two centuries to remove: Industrial-era logging and agriculture have erased this carbon accumulation.
Collapse
Affiliation(s)
- A. M. Raiho
- Department of Biological Sciences, University of Notre Dame, South Bend, IN, USA
- Earth System Science Interdisciplinary Center, University of Maryland, College Park, College Park, MD, USA
| | - C. J. Paciorek
- Department of Statistics, University of California, Berkeley, Berkeley, CA, USA
| | - A. Dawson
- Department of General Education, Mount Royal University, Calgary, Alberta, Canada
- Department of Biological Sciences, University of Calgary, Calgary, Alberta, Canada
| | - S. T. Jackson
- US Geological Survey, Southwest and South Central Climate Adaptation Centers, Tucson, AZ, USA
- Department of Geosciences, University of Arizona, Tucson, AZ, USA
| | - D. J. Mladenoff
- Department of Forest and Wildlife Ecology, University of Wisconsin–Madison, Madison, WI, USA
| | - J. W. Williams
- Department of Geography, University of Wisconsin–Madison, Madison, WI, USA
- Center for Climatic Research, University of Wisconsin–Madison, Madison, WI, USA
| | - J. S. McLachlan
- Department of Biological Sciences, University of Notre Dame, South Bend, IN, USA
| |
Collapse
|
9
|
Dallakyan A, Pourahmadi M. Fused-Lasso Regularized Cholesky Factors of Large Nonstationary Covariance Matrices of Replicated Time Series. J Comput Graph Stat 2022. [DOI: 10.1080/10618600.2022.2090367] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
10
|
Jahja M, Chin A, Tibshirani RJ. Real-Time Estimation of COVID-19 Infections: Deconvolution and Sensor Fusion. Stat Sci 2022. [DOI: 10.1214/22-sts856] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Maria Jahja
- Maria Jahja is Ph.D. Candidate, Department of Statistics & Data Science, Machine Learning Department, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA
| | - Andrew Chin
- Andrew Chin is Statistical Developer, Machine Learning Department, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA
| | - Ryan J. Tibshirani
- Ryan J. Tibshirani is Professor, Department of Statistics & Data Science, Machine Learning Department, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA
| |
Collapse
|
11
|
Shimmura R, Suzuki J. Converting ADMM to a proximal gradient for efficient sparse estimation. JAPANESE JOURNAL OF STATISTICS AND DATA SCIENCE 2022. [DOI: 10.1007/s42081-022-00150-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
12
|
DeWitt WS, Harris KD, Ragsdale AP, Harris K. Nonparametric coalescent inference of mutation spectrum history and demography. Proc Natl Acad Sci U S A 2021; 118:e2013798118. [PMID: 34016747 PMCID: PMC8166128 DOI: 10.1073/pnas.2013798118] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
As populations boom and bust, the accumulation of genetic diversity is modulated, encoding histories of living populations in present-day variation. Many methods exist to decode these histories, and all must make strong model assumptions. It is typical to assume that mutations accumulate uniformly across the genome at a constant rate that does not vary between closely related populations. However, recent work shows that mutational processes in human and great ape populations vary across genomic regions and evolve over time. This perturbs the mutation spectrum (relative mutation rates in different local nucleotide contexts). Here, we develop theoretical tools in the framework of Kingman's coalescent to accommodate mutation spectrum dynamics. We present mutation spectrum history inference (mushi), a method to perform nonparametric inference of demographic and mutation spectrum histories from allele frequency data. We use mushi to reconstruct trajectories of effective population size and mutation spectrum divergence between human populations, identify mutation signatures and their dynamics in different human populations, and calibrate the timing of a previously reported mutational pulse in the ancestors of Europeans. We show that mutation spectrum histories can be placed in a well-studied theoretical setting and rigorously inferred from genomic variation data, like other features of evolutionary history.
Collapse
Affiliation(s)
- William S DeWitt
- Department of Genome Sciences, University of Washington, Seattle, WA 98195;
- Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, WA 98109
| | - Kameron Decker Harris
- Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, WA 98195
- Department of Biology, University of Washington, Seattle, WA 98195
| | - Aaron P Ragsdale
- National Laboratory of Genomics for Biodiversity, Unit of Advanced Genomics, Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional, Irapuato, Mexico 36821
| | - Kelley Harris
- Department of Genome Sciences, University of Washington, Seattle, WA 98195;
- Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, WA 98109
| |
Collapse
|
13
|
Jensen AK, Ekstrøm CT. Quantifying the trendiness of trends. J R Stat Soc Ser C Appl Stat 2020; 70:98-121. [DOI: 10.1111/rssc.12451] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2019] [Accepted: 09/24/2020] [Indexed: 11/27/2022]
Affiliation(s)
- Andreas Kryger Jensen
- Biostatistics, Institute of Public Health University of Copenhagen Copenhagen Denmark
| | - Claus Thorn Ekstrøm
- Biostatistics, Institute of Public Health University of Copenhagen Copenhagen Denmark
| |
Collapse
|
14
|
Amato U, Antoniadis A, De Feis I. Flexible, boundary adapted, nonparametric methods for the estimation of univariate piecewise-smooth functions. STATISTICS SURVEYS 2020. [DOI: 10.1214/20-ss128] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
15
|
|
16
|
Yen TJ. Solving Fused Penalty Estimation Problems via Block Splitting Algorithms. J Comput Graph Stat 2019. [DOI: 10.1080/10618600.2019.1660178] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Affiliation(s)
- Tso-Jung Yen
- Institute of Statistical Science, Academia Sinica, Taipei, Taiwan
| |
Collapse
|
17
|
Waldmann P, Ferenčaković M, Mészáros G, Khayatzadeh N, Curik I, Sölkner J. AUTALASSO: an automatic adaptive LASSO for genome-wide prediction. BMC Bioinformatics 2019; 20:167. [PMID: 30940067 PMCID: PMC6444607 DOI: 10.1186/s12859-019-2743-3] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2018] [Accepted: 03/18/2019] [Indexed: 01/30/2023] Open
Abstract
Background Genome-wide prediction has become the method of choice in animal and plant breeding. Prediction of breeding values and phenotypes are routinely performed using large genomic data sets with number of markers on the order of several thousands to millions. The number of evaluated individuals is usually smaller which results in problems where model sparsity is of major concern. The LASSO technique has proven to be very well-suited for sparse problems often providing excellent prediction accuracy. Several computationally efficient LASSO algorithms have been developed, but optimization of hyper-parameters can be demanding. Results We have developed a novel automatic adaptive LASSO (AUTALASSO) based on the alternating direction method of multipliers (ADMM) optimization algorithm. The two major hyper-parameters of ADMM are the learning rate and the regularization factor. The learning rate is automatically tuned with line search and the regularization factor optimized using Golden section search. Results show that AUTALASSO provides superior prediction accuracy when evaluated on simulated and real bull data compared to the adaptive LASSO, LASSO and ridge regression implemented in the popular glmnet software. Conclusions The AUTALASSO provides a very flexible and computationally efficient approach to GWP, especially when it is important to obtain high prediction accuracy and genetic gain. The AUTALASSO also has the capability to perform GWAS of both additive and dominance effects with smaller prediction error than the ordinary LASSO.
Collapse
Affiliation(s)
- Patrik Waldmann
- Department of Animal Breeding and Genetics, Swedish University of Agricultural Sciences, Box 7023, Uppsala, 750 07, Sweden.
| | - Maja Ferenčaković
- Department of Animal Science, Faculty of Agriculture, University of Zagreb, Svetosimunska 25, Zagreb, 10000, Croatia
| | - Gábor Mészáros
- Division of Livestock Sciences,Department of Sustainable Agricultural Systems,University of Natural Resources and Life Sciences Vienna, Gregor Mendel Str. 33, Vienna, A-1180, Austria
| | - Negar Khayatzadeh
- Division of Livestock Sciences,Department of Sustainable Agricultural Systems,University of Natural Resources and Life Sciences Vienna, Gregor Mendel Str. 33, Vienna, A-1180, Austria
| | - Ino Curik
- Department of Animal Science, Faculty of Agriculture, University of Zagreb, Svetosimunska 25, Zagreb, 10000, Croatia
| | - Johann Sölkner
- Division of Livestock Sciences,Department of Sustainable Agricultural Systems,University of Natural Resources and Life Sciences Vienna, Gregor Mendel Str. 33, Vienna, A-1180, Austria
| |
Collapse
|
18
|
Petersen A, Witten D. Data-adaptive additive modeling. Stat Med 2019; 38:583-600. [PMID: 30010200 PMCID: PMC6335202 DOI: 10.1002/sim.7859] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2017] [Revised: 05/18/2018] [Accepted: 06/05/2018] [Indexed: 11/10/2022]
Abstract
In this paper, we consider fitting a flexible and interpretable additive regression model in a data-rich setting. We wish to avoid pre-specifying the functional form of the conditional association between each covariate and the response, while still retaining interpretability of the fitted functions. A number of recent proposals in the literature for nonparametric additive modeling are data adaptive, in the sense that they can adjust the level of flexibility in the functional fits to the data at hand. For instance, the sparse additive model makes it possible to adaptively determine which features should be included in the fitted model, the sparse partially linear additive model allows each feature in the fitted model to take either a linear or a nonlinear functional form, and the recent fused lasso additive model and additive trend filtering proposals allow the knots in each nonlinear function fit to be selected from the data. In this paper, we combine the strengths of each of these recent proposals into a single proposal that uses the data to determine which features to include in the model, whether to model each feature linearly or nonlinearly, and what form to use for the nonlinear functions. We establish connections between our approach and recent proposals from the literature, and we demonstrate its strengths in a simulation study.
Collapse
Affiliation(s)
- Ashley Petersen
- Division of Biostatistics, University of Minnesota, Minneapolis, MN, USA
| | - Daniela Witten
- Departments of Biostatistics and Statistics, University of Washington, Seattle, WA, USA
| |
Collapse
|
19
|
Abstract
As datasets continue to increase in size, there is growing interest in methods for prediction that are both Received January 2018 flexible and interpretable. A flurry of recent work on this topic has focused on additive modeling in the Revised February 2019 regression setting, and in particular, on the use of data-adaptive nonlinear functions that can be used to flexibly model each covariate's effect, conditional on the other features in the model. In this article, we extend this recent line of work to the survival setting. We develop an additive Cox proportional hazards model, in which each additive function is obtained by trend filtering, so that the fitted functions are piece-wise polynomial with adaptively chosen knots. An efficient proximal gradient descent algorithm is used to fit the model. We demonstrate its performance in simulations and in application to a primary biliary cirrhosis data set, as well as a dataset consisting of time to publication for clinical trials in the biomedical literature. Supplementary materials for this article are available online.
Collapse
Affiliation(s)
- Jiacheng Wu
- Department of Biostatistics, University of Washington, Seattle, WA
| | - Daniela Witten
- Department of Biostatistics, University of Washington, Seattle, WA.,Department of Statistics, University of Washington, Seattle, WA
| |
Collapse
|
20
|
Abstract
In this paper we present a new non-parametric calibration method called ensemble of near isotonic regression (ENIR). The method can be considered as an extension of BBQ (Pakdaman Naeini, Cooper and Hauskrecht, 2015b), a recently proposed calibration method, as well as the commonly used calibration method based on isotonic regression (IsoRegC) (Zadrozny and Elkan, 2002). ENIR is designed to address the key limitation of IsoRegC which is the monotonicity assumption of the predictions. Similar to BBQ, the method post-processes the output of a binary classifier to obtain calibrated probabilities. Thus it can be used with many existing classification models to generate accurate probabilistic predictions. We demonstrate the performance of ENIR on synthetic and real datasets for commonly applied binary classification models. Experimental results show that the method outperforms several common binary classifier calibration methods. In particular, on the real data we evaluated, ENIR commonly performs statistically significantly better than the other methods, and never worse. It is able to improve the calibration power of classifiers, while retaining their discrimination power. The method is also computationally tractable for large scale datasets, as it is O(N logN) time, where N is the number of samples.
Collapse
|
21
|
Segal BD, Elliott MR, Braun T, Jiang H. P-splines with an $\ell_{1}$ penalty for repeated measures. Electron J Stat 2018. [DOI: 10.1214/18-ejs1487] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
22
|
Spealman P, Naik AW, May GE, Kuersten S, Freeberg L, Murphy RF, McManus J. Conserved non-AUG uORFs revealed by a novel regression analysis of ribosome profiling data. Genome Res 2017; 28:214-222. [PMID: 29254944 PMCID: PMC5793785 DOI: 10.1101/gr.221507.117] [Citation(s) in RCA: 55] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2017] [Accepted: 12/11/2017] [Indexed: 12/14/2022]
Abstract
Upstream open reading frames (uORFs), located in transcript leaders (5' UTRs), are potent cis-acting regulators of translation and mRNA turnover. Recent genome-wide ribosome profiling studies suggest that thousands of uORFs initiate with non-AUG start codons. Although intriguing, these non-AUG uORF predictions have been made without statistical control or validation; thus, the importance of these elements remains to be demonstrated. To address this, we took a comparative genomics approach to study AUG and non-AUG uORFs. We mapped transcription leaders in multiple Saccharomyces yeast species and applied a novel machine learning algorithm (uORF-seqr) to ribosome profiling data to identify statistically significant uORFs. We found that AUG and non-AUG uORFs are both frequently found in Saccharomyces yeasts. Although most non-AUG uORFs are found in only one species, hundreds have either conserved sequence or position within Saccharomyces uORFs initiating with UUG are particularly common and are shared between species at rates similar to that of AUG uORFs. However, non-AUG uORFs are translated less efficiently than AUG-uORFs and are less subject to removal via alternative transcription initiation under normal growth conditions. These results suggest that a subset of non-AUG uORFs may play important roles in regulating gene expression.
Collapse
Affiliation(s)
- Pieter Spealman
- Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, USA
| | - Armaghan W Naik
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, USA
| | - Gemma E May
- Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, USA
| | | | | | - Robert F Murphy
- Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, USA.,Computational Biology Department, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, USA
| | - Joel McManus
- Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, USA
| |
Collapse
|
23
|
Kim DH, Kim J, Marques JC, Grama A, Hildebrand DGC, Gu W, Li JM, Robson DN. Pan-neuronal calcium imaging with cellular resolution in freely swimming zebrafish. Nat Methods 2017; 14:1107-1114. [DOI: 10.1038/nmeth.4429] [Citation(s) in RCA: 139] [Impact Index Per Article: 19.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2017] [Accepted: 07/29/2017] [Indexed: 12/15/2022]
|
24
|
Affiliation(s)
- Yunzhang Zhu
- Department of Statistics, The Ohio State University, Columbus, Ohio
| |
Collapse
|