2
|
Zheng Y, Corley DA, Doubeni C, Halm E, Shortreed SM, Barlow WE, Zauber A, Tosteson TD, Chubak J. ANALYSES OF PREVENTIVE CARE MEASURES WITH INCOMPLETE HISTORICAL DATA IN ELECTRONIC MEDICAL RECORDS: AN EXAMPLE FROM COLORECTAL CANCER SCREENING. Ann Appl Stat 2020; 14:1030-1044. [PMID: 34531936 PMCID: PMC8442666 DOI: 10.1214/20-aoas1342] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
The calculation of quality of care measures based on electronic medical records (EMRs) may be inaccurate because of incomplete capture of past services. We evaluate the influence of different statistical approaches for calculating the proportion of patients who are up-to-date for a preventive service, using the example of colorectal cancer (CRC) screening. We propose an extension of traditional mixture models to account for the uncertainty in compliance, which is further complicated by the choice of various screening modalities with different recommended screening intervals. We conducted simulation studies to compare various statistical approaches and demonstrated that the proposed method can alleviate bias when individuals with complete prior medical history information were not representative of the targeted population. The method is motivated by and applied to data from the National Cancer Institute-funded consortium Population-Based Research Optimizing Screening through Personalized Regiments (PROSPR). Findings from the application are important for the evaluation of appropriate use of preventive care and provide a novel tool for dealing with similar analytical challenges with EMR data in broad settings.
Collapse
Affiliation(s)
- Yingye Zheng
- Department of Biostatistics, Fred Hutchinson Cancer Research Center, Seattle WA
| | - Douglas A. Corley
- Division of Research, Kaiser Permanente Northern California, Oakland, CA
| | - Chyke Doubeni
- Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA
| | - Ethan Halm
- Harold C. Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical School, Dallas TX
| | | | | | - Ann Zauber
- Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY
| | | | - Jessica Chubak
- Health Research Institute, Kaiser Permanente Washington, Seattle WA
| |
Collapse
|
3
|
Abstract
Bayesian nonparametric (BNP) models are becoming increasingly important in psychology, both as theoretical models of cognition and as analytic tools. However, existing tutorials tend to be at a level of abstraction largely impenetrable by non-technicians. This tutorial aims to help beginners understand key concepts by working through important but often omitted derivations carefully and explicitly, with a focus on linking the mathematics with a practical computation solution for a Dirichlet Process Mixture Model (DPMM)-one of the most widely used BNP methods. Abstract concepts are made explicit and concrete to non-technical readers by working through the theory that gives rise to them. A publicly accessible computer program written in the statistical language R is explained line-by-line to help readers understand the computation algorithm. The algorithm is also linked to a construction method called the Chinese Restaurant Process in an accessible tutorial in this journal (Gershman & Blei, 2012). The overall goals are to help readers understand more fully the theory and application so that they may apply BNP methods in their own work and leverage the technical details in this tutorial to develop novel methods.
Collapse
Affiliation(s)
- Yuelin Li
- Department of Psychiatry & Behavioral Sciences, Memorial Sloan Kettering Cancer Center, New York, NY 10022, USA
- Department of Epidemiology & Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY 10022, USA
| | - Elizabeth Schofield
- Department of Psychiatry & Behavioral Sciences, Memorial Sloan Kettering Cancer Center, New York, NY 10022, USA
| | - Mithat Gönen
- Department of Epidemiology & Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY 10022, USA
| |
Collapse
|
4
|
Wallace ML, Buysse DJ, Germain A, Hall MH, Iyengar S. Variable Selection for Skewed Model-Based Clustering: Application to the Identification of Novel Sleep Phenotypes. J Am Stat Assoc 2018; 113:95-110. [PMID: 31086426 DOI: 10.1080/01621459.2017.1330202] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
In sleep research, applying finite mixture models to sleep characteristics captured 8 through multiple data types, including self-reported sleep diary, a wrist monitor capturing movement (actigraphy), and brain waves (polysomnography), may suggest new phenotypes that reflect underlying disease mechanisms. However, a direct mixture model application is challenging because there are many sleep variables from which to choose, and sleep variables are often highly skewed even in homogenous samples. Moreover, previous sleep research findings indicate that some of the most clinically interesting solutions will be those that incorporate all three data types. Thus, we present two novel skewed variable selection algorithms based on the multivariate skew normal (MSN) distribution: one that selects the best set of variables ignoring data type and another that embraces the exploratory nature of clustering and suggests multiple statistically plausible sets of variables that each incorporate all data types. Through a simulation study we empirically compare our approach with other asymmetric and normal dimension reduction strategies for clustering. Finally, we demonstrate our methods using a sample of older adults with and without insomnia. The proposed MSN-based variable selection algorithm appears to be suitable for both MSN and multivariate normal cluster distributions, especially with moderate to large sample sizes.
Collapse
Affiliation(s)
- Meredith L Wallace
- Department of Statistics, University of Pittsburgh.,Department of Psychiatry, University of Pittsburgh
| | | | - Anne Germain
- Department of Psychiatry, University of Pittsburgh
| | | | - Satish Iyengar
- Department of Statistics, University of Pittsburgh.,Department of Psychiatry, University of Pittsburgh
| |
Collapse
|
5
|
Li Q, Schissler AG, Gardeux V, Achour I, Kenost C, Berghout J, Li H, Zhang HH, Lussier YA. N-of-1-pathways MixEnrich: advancing precision medicine via single-subject analysis in discovering dynamic changes of transcriptomes. BMC Med Genomics 2017; 10:27. [PMID: 28589853 PMCID: PMC5461551 DOI: 10.1186/s12920-017-0263-4] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background Transcriptome analytic tools are commonly used across patient cohorts to develop drugs and predict clinical outcomes. However, as precision medicine pursues more accurate and individualized treatment decisions, these methods are not designed to address single-patient transcriptome analyses. We previously developed and validated the N-of-1-pathways framework using two methods, Wilcoxon and Mahalanobis Distance (MD), for personal transcriptome analysis derived from a pair of samples of a single patient. Although, both methods uncover concordantly dysregulated pathways, they are not designed to detect dysregulated pathways with up- and down-regulated genes (bidirectional dysregulation) that are ubiquitous in biological systems. Results We developed N-of-1-pathways MixEnrich, a mixture model followed by a gene set enrichment test, to uncover bidirectional and concordantly dysregulated pathways one patient at a time. We assess its accuracy in a comprehensive simulation study and in a RNA-Seq data analysis of head and neck squamous cell carcinomas (HNSCCs). In presence of bidirectionally dysregulated genes in the pathway or in presence of high background noise, MixEnrich substantially outperforms previous single-subject transcriptome analysis methods, both in the simulation study and the HNSCCs data analysis (ROC Curves; higher true positive rates; lower false positive rates). Bidirectional and concordant dysregulated pathways uncovered by MixEnrich in each patient largely overlapped with the quasi-gold standard compared to other single-subject and cohort-based transcriptome analyses. Conclusion The greater performance of MixEnrich presents an advantage over previous methods to meet the promise of providing accurate personal transcriptome analysis to support precision medicine at point of care. Electronic supplementary material The online version of this article (doi:10.1186/s12920-017-0263-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Qike Li
- Center for Biomedical Informatics and Biostatistics, The University of Arizona, Tucson, AZ, 85721, USA.,Bio5 Institute, The University of Arizona, Tucson, AZ, 85721, USA.,Department of Medicine, The University of Arizona, Tucson, AZ, 85721, USA.,Graduate Interdisciplinary Program in Statistics, The University of Arizona, Tucson, AZ, 85721, USA
| | - A Grant Schissler
- Center for Biomedical Informatics and Biostatistics, The University of Arizona, Tucson, AZ, 85721, USA.,Bio5 Institute, The University of Arizona, Tucson, AZ, 85721, USA.,Department of Medicine, The University of Arizona, Tucson, AZ, 85721, USA.,Graduate Interdisciplinary Program in Statistics, The University of Arizona, Tucson, AZ, 85721, USA
| | - Vincent Gardeux
- Center for Biomedical Informatics and Biostatistics, The University of Arizona, Tucson, AZ, 85721, USA.,Bio5 Institute, The University of Arizona, Tucson, AZ, 85721, USA.,Department of Medicine, The University of Arizona, Tucson, AZ, 85721, USA
| | - Ikbel Achour
- Center for Biomedical Informatics and Biostatistics, The University of Arizona, Tucson, AZ, 85721, USA.,Bio5 Institute, The University of Arizona, Tucson, AZ, 85721, USA.,Department of Medicine, The University of Arizona, Tucson, AZ, 85721, USA
| | - Colleen Kenost
- Center for Biomedical Informatics and Biostatistics, The University of Arizona, Tucson, AZ, 85721, USA.,Bio5 Institute, The University of Arizona, Tucson, AZ, 85721, USA.,Department of Medicine, The University of Arizona, Tucson, AZ, 85721, USA
| | - Joanne Berghout
- Center for Biomedical Informatics and Biostatistics, The University of Arizona, Tucson, AZ, 85721, USA.,Bio5 Institute, The University of Arizona, Tucson, AZ, 85721, USA.,Department of Medicine, The University of Arizona, Tucson, AZ, 85721, USA
| | - Haiquan Li
- Center for Biomedical Informatics and Biostatistics, The University of Arizona, Tucson, AZ, 85721, USA. .,Bio5 Institute, The University of Arizona, Tucson, AZ, 85721, USA. .,Department of Medicine, The University of Arizona, Tucson, AZ, 85721, USA.
| | - Hao Helen Zhang
- Graduate Interdisciplinary Program in Statistics, The University of Arizona, Tucson, AZ, 85721, USA. .,Department of Mathematics, The University of Arizona, Tucson, AZ, 85721, USA.
| | - Yves A Lussier
- Center for Biomedical Informatics and Biostatistics, The University of Arizona, Tucson, AZ, 85721, USA. .,Bio5 Institute, The University of Arizona, Tucson, AZ, 85721, USA. .,Department of Medicine, The University of Arizona, Tucson, AZ, 85721, USA. .,Graduate Interdisciplinary Program in Statistics, The University of Arizona, Tucson, AZ, 85721, USA. .,University of Arizona Cancer Center, The University of Arizona, Tucson, AZ, 85721, USA. .,Institute for Genomics and Systems Biology, The University of Chicago, Chicago, IL, 60637, USA.
| |
Collapse
|