Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Albers DJ, Hripcsak G. Estimation of time-delayed mutual information and bias for irregularly and sparsely sampled time-series. Chaos Solitons Fractals 2012;45:853-860. [PMID: 22536009 PMCID: PMC3332129 DOI: 10.1016/j.chaos.2012.03.003] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]

For:	Albers DJ, Hripcsak G. Estimation of time-delayed mutual information and bias for irregularly and sparsely sampled time-series. Chaos Solitons Fractals 2012;45:853-860. [PMID: 22536009 PMCID: PMC3332129 DOI: 10.1016/j.chaos.2012.03.003] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]

Number

Cited by Other Article(s)

Assaad CK, Devijver E, Gaussier E. Entropy-Based Discovery of Summary Causal Graphs in Time Series. ENTROPY (BASEL, SWITZERLAND) 2022;24:1156. [PMID: 36010820 PMCID: PMC9407574 DOI: 10.3390/e24081156] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Revised: 08/05/2022] [Accepted: 08/14/2022] [Indexed: 06/15/2023]

Seri R, Martinoli M. Asymptotic Properties of the Plug-in Estimator of the Discrete Entropy Under Dependence. IEEE TRANSACTIONS ON INFORMATION THEORY 2021;67:7659-7683. [DOI: 10.1109/tit.2021.3109307] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/02/2023]

Estiri H, Strasser ZH, Murphy SN. High-throughput phenotyping with temporal sequences. J Am Med Inform Assoc 2021;28:772-781. [PMID: 33313899 DOI: 10.1093/jamia/ocaa288] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2020] [Accepted: 11/04/2020] [Indexed: 12/15/2022] Open

Abstract

OBJECTIVE

High-throughput electronic phenotyping algorithms can accelerate translational research using data from electronic health record (EHR) systems. The temporal information buried in EHRs is often underutilized in developing computational phenotypic definitions. This study aims to develop a high-throughput phenotyping method, leveraging temporal sequential patterns from EHRs.

MATERIALS AND METHODS

We develop a representation mining algorithm to extract 5 classes of representations from EHR diagnosis and medication records: the aggregated vector of the records (aggregated vector representation), the standard sequential patterns (sequential pattern mining), the transitive sequential patterns (transitive sequential pattern mining), and 2 hybrid classes. Using EHR data on 10 phenotypes from the Mass General Brigham Biobank, we train and validate phenotyping algorithms.

RESULTS

Phenotyping with temporal sequences resulted in a superior classification performance across all 10 phenotypes compared with the standard representations in electronic phenotyping. The high-throughput algorithm's classification performance was superior or similar to the performance of previously published electronic phenotyping algorithms. We characterize and evaluate the top transitive sequences of diagnosis records paired with the records of risk factors, symptoms, complications, medications, or vaccinations.

DISCUSSION

The proposed high-throughput phenotyping approach enables seamless discovery of sequential record combinations that may be difficult to assume from raw EHR data. Transitive sequences offer more accurate characterization of the phenotype, compared with its individual components, and reflect the actual lived experiences of the patients with that particular disease.

CONCLUSION

Sequential data representations provide a precise mechanism for incorporating raw EHR records into downstream machine learning. Our approach starts with user interpretability and works backward to the technology.

Collapse

Between-day repeatability of sensor-based in-home gait assessment among older adults: assessing the effect of frailty. Aging Clin Exp Res 2021;33:1529-1537. [PMID: 32930988 DOI: 10.1007/s40520-020-01686-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2020] [Accepted: 08/14/2020] [Indexed: 01/10/2023]

Abstract

BACKGROUND

While sensor-based daily physical activity (DPA) gait assessment has been demonstrated to be an effective measure of physical frailty and fall-risk, the repeatability of DPA gait parameters between different days of measurement is not clear.

AIMS

To evaluate test-retest reliability (repeatability) of DPA gait performance parameters, representing the quality of walking, and quantitative gait measures (e.g. number of steps) between two separate days of assessment among older adults.

METHODS

DPA was acquired for 48-h from older adults (age ≥ 65 years) using a tri-axial accelerometer. Continuous walking bouts (≥ 60 s) were identified from acceleration data and used to extract gait performance parameters, including time- and frequency-domain gait parameters, representing walking speed, variability, and irregularity. To assess repeatability, intraclass correlation coefficient (ICC) was calculated using two-way mixed effects F-test models for day-1 vs. day-2 as the independent random effect. Repeatability tests were performed for all participants and also within frailty groups (non-frail and pre-frail/frail identified using Fried phenotype).

RESULTS

Data was analyzed from 63 older adults (29 non-frail and 34 pre-frail/frail). Most of the time- and frequency-domain gait performance parameters showed good to excellent repeatability (ICC ≥ 0.70), while quantitative parameters, including number of steps and walking duration showed poor repeatability (ICC < 0.30). Among majority of the gait performance parameters, we observed higher repeatability among the pre-frail/frail group (ICC > 0.78) compared to non-frail individuals (0.39 < ICC < 0.55).

CONCLUSION

Gait performance parameters, showed higher repeatability compared to quantitative measures. Higher repeatability among pre-frail/frail individuals may be attributed to a reduced functional capacity for performing more intense and variable physical tasks.

TRIAL REGISTRATION

The clinical trial was retrospectively registered on June 18th, 2013 with ClinicalTrials.gov, identifier NCT01880229.

Collapse

Estiri H, Strasser ZH, Klann JG, McCoy TH, Wagholikar KB, Vasey S, Castro VM, Murphy ME, Murphy SN. Transitive Sequencing Medical Records for Mining Predictive and Interpretable Temporal Representations. PATTERNS (NEW YORK, N.Y.) 2020;1:100051. [PMID: 32835307 PMCID: PMC7301790 DOI: 10.1016/j.patter.2020.100051] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/13/2020] [Revised: 04/27/2020] [Accepted: 05/26/2020] [Indexed: 12/13/2022]

Affiliation(s)

Hossein Estiri Laboratory of Computer Science, Massachusetts General Hospital, Boston, MA 02144, USA Research Information Science and Computing, Mass General Brigham, Somerville, MA 02145, USA Harvard Medical School, Boston, MA 02115, USA
Zachary H. Strasser Laboratory of Computer Science, Massachusetts General Hospital, Boston, MA 02144, USA Research Information Science and Computing, Mass General Brigham, Somerville, MA 02145, USA Harvard Medical School, Boston, MA 02115, USA Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
Jeffery G. Klann Laboratory of Computer Science, Massachusetts General Hospital, Boston, MA 02144, USA Research Information Science and Computing, Mass General Brigham, Somerville, MA 02145, USA Harvard Medical School, Boston, MA 02115, USA
Thomas H. McCoy Harvard Medical School, Boston, MA 02115, USA Center for Quantitative Health, Massachusetts General Hospital, Boston, MA 02114, USA
Kavishwar B. Wagholikar Laboratory of Computer Science, Massachusetts General Hospital, Boston, MA 02144, USA Research Information Science and Computing, Mass General Brigham, Somerville, MA 02145, USA Harvard Medical School, Boston, MA 02115, USA
Sebastien Vasey Department of Mathematics, Harvard University, Cambridge, MA 02138, USA
Victor M. Castro Research Information Science and Computing, Mass General Brigham, Somerville, MA 02145, USA
MaryKate E. Murphy Research Information Science and Computing, Mass General Brigham, Somerville, MA 02145, USA
Shawn N. Murphy Laboratory of Computer Science, Massachusetts General Hospital, Boston, MA 02144, USA Research Information Science and Computing, Mass General Brigham, Somerville, MA 02145, USA Harvard Medical School, Boston, MA 02115, USA Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA Department of Neurology, Massachusetts General Hospital, Boston, MA 02114, USA

Collapse

Pradeep Kumar D, Toosizadeh N, Mohler J, Ehsani H, Mannier C, Laksari K. Sensor-based characterization of daily walking: a new paradigm in pre-frailty/frailty assessment. BMC Geriatr 2020;20:164. [PMID: 32375700 PMCID: PMC7203790 DOI: 10.1186/s12877-020-01572-1] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2019] [Accepted: 04/28/2020] [Indexed: 12/14/2022] Open

Abstract

BACKGROUND

Frailty is a highly recognized geriatric syndrome resulting in decline in reserve across multiple physiological systems. Impaired physical function is one of the major indicators of frailty. The goal of this study was to evaluate an algorithm that discriminates between frailty groups (non-frail and pre-frail/frail) based on gait performance parameters derived from unsupervised daily physical activity (DPA).

METHODS

DPA was acquired for 48 h from older adults (≥65 years) using a tri-axial accelerometer motion-sensor. Continuous bouts of walking for 20s, 30s, 40s, 50s and 60s without pauses were identified from acceleration data. These were then used to extract qualitative measures (gait variability, gait asymmetry, and gait irregularity) and quantitative measures (total continuous walking duration and maximum number of continuous steps) to characterize gait performance. Association between frailty and gait performance parameters was assessed using multinomial logistic models with frailty as the dependent variable, and gait performance parameters along with demographic parameters as independent variables.

RESULTS

One hundred twenty-six older adults (44 non-frail, 60 pre-frail, and 22 frail, based on the Fried index) were recruited. Step- and stride-times, frequency domain gait variability, and continuous walking quantitative measures were significantly different between non-frail and pre-frail/frail groups (p < 0.05). Among the five different durations (20s, 30s, 40s, 50s and 60s), gait performance parameters extracted from 60s continuous walks provided the best frailty assessment results. Using the 60s gait performance parameters in the logistic model, pre-frail/frail group (vs. non-frail) was identified with 76.8% sensitivity and 80% specificity.

DISCUSSION

Everyday walking characteristics were found to be associated with frailty. Along with quantitative measures of physical activity, qualitative measures are critical elements representing the early stages of frailty. In-home gait assessment offers an opportunity to screen for and monitor frailty.

TRIAL REGISTRATION

The clinical trial was retrospectively registered on June 18th, 2013 with ClinicalTrials.gov, identifier NCT01880229.

Collapse

Estiri H, Vasey S, Murphy SN. Transitive Sequential Pattern Mining for Discrete Clinical Data. Artif Intell Med 2020. [DOI: 10.1007/978-3-030-59137-3_37] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]

Levine ME, Albers DJ, Hripcsak G. Methodological variations in lagged regression for detecting physiologic drug effects in EHR data. J Biomed Inform 2018;86:149-159. [PMID: 30172760 DOI: 10.1016/j.jbi.2018.08.014] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2018] [Revised: 07/20/2018] [Accepted: 08/29/2018] [Indexed: 12/22/2022]

Abstract

We studied how lagged linear regression can be used to detect the physiologic effects of drugs from data in the electronic health record (EHR). We systematically examined the effect of methodological variations ((i) time series construction, (ii) temporal parameterization, (iii) intra-subject normalization, (iv) differencing (lagged rates of change achieved by taking differences between consecutive measurements), (v) explanatory variables, and (vi) regression models) on performance of lagged linear methods in this context. We generated two gold standards (one knowledge-base derived, one expert-curated) for expected pairwise relationships between 7 drugs and 4 labs, and evaluated how the 64 unique combinations of methodological perturbations reproduce the gold standards. Our 28 cohorts included patients in the Columbia University Medical Center/NewYork-Presbyterian Hospital clinical database, and ranged from 2820 to 79,514 patients with between 8 and 209 average time points per patient. The most accurate methods achieved AUROC of 0.794 for knowledge-base derived gold standard (95%CI [0.741, 0.847]) and 0.705 for expert-curated gold standard (95% CI [0.629, 0.781]). We observed a mean AUROC of 0.633 (95%CI [0.610, 0.657], expert-curated gold standard) across all methods that re-parameterize time according to sequence and use either a joint autoregressive model with time-series differencing or an independent lag model without differencing. The complement of this set of methods achieved a mean AUROC close to 0.5, indicating the importance of these choices. We conclude that time-series analysis of EHR data will likely rely on some of the beneficial pre-processing and modeling methodologies identified, and will certainly benefit from continued careful analysis of methodological perturbations. This study found that methodological variations, such as pre-processing and representations, have a large effect on results, exposing the importance of thoroughly evaluating these components when comparing machine-learning methods.

Collapse

Hripcsak G, Albers DJ. High-fidelity phenotyping: richness and freedom from bias. J Am Med Inform Assoc 2018;25:289-294. [PMID: 29040596 PMCID: PMC7282504 DOI: 10.1093/jamia/ocx110] [Citation(s) in RCA: 40] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2017] [Revised: 08/07/2017] [Accepted: 09/06/2017] [Indexed: 01/14/2023] Open

Albers DJ, Elhadad N, Claassen J, Perotte R, Goldstein A, Hripcsak G. Estimating summary statistics for electronic health record laboratory data for use in high-throughput phenotyping algorithms. J Biomed Inform 2018;78:87-101. [PMID: 29369797 PMCID: PMC5856130 DOI: 10.1016/j.jbi.2018.01.004] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2017] [Revised: 12/05/2017] [Accepted: 01/14/2018] [Indexed: 01/12/2023]

Abstract

We study the question of how to represent or summarize raw laboratory data taken from an electronic health record (EHR) using parametric model selection to reduce or cope with biases induced through clinical care. It has been previously demonstrated that the health care process (Hripcsak and Albers, 2012, 2013), as defined by measurement context (Hripcsak and Albers, 2013; Albers et al., 2012) and measurement patterns (Albers and Hripcsak, 2010, 2012), can influence how EHR data are distributed statistically (Kohane and Weber, 2013; Pivovarov et al., 2014). We construct an algorithm, PopKLD, which is based on information criterion model selection (Burnham and Anderson, 2002; Claeskens and Hjort, 2008), is intended to reduce and cope with health care process biases and to produce an intuitively understandable continuous summary. The PopKLD algorithm can be automated and is designed to be applicable in high-throughput settings; for example, the output of the PopKLD algorithm can be used as input for phenotyping algorithms. Moreover, we develop the PopKLD-CAT algorithm that transforms the continuous PopKLD summary into a categorical summary useful for applications that require categorical data such as topic modeling. We evaluate our methodology in two ways. First, we apply the method to laboratory data collected in two different health care contexts, primary versus intensive care. We show that the PopKLD preserves known physiologic features in the data that are lost when summarizing the data using more common laboratory data summaries such as mean and standard deviation. Second, for three disease-laboratory measurement pairs, we perform a phenotyping task: we use the PopKLD and PopKLD-CAT algorithms to define high and low values of the laboratory variable that are used for defining a disease state. We then compare the relationship between the PopKLD-CAT summary disease predictions and the same predictions using empirically estimated mean and standard deviation to a gold standard generated by clinical review of patient records. We find that the PopKLD laboratory data summary is substantially better at predicting disease state. The PopKLD or PopKLD-CAT algorithms are not meant to be used as phenotyping algorithms, but we use the phenotyping task to show what information can be gained when using a more informative laboratory data summary. In the process of evaluation our method we show that the different clinical contexts and laboratory measurements necessitate different statistical summaries. Similarly, leveraging the principle of maximum entropy we argue that while some laboratory data only have sufficient information to estimate a mean and standard deviation, other laboratory data captured in an EHR contain substantially more information than can be captured in higher-parameter models.

Collapse

Levine ME, Albers DJ, Hripcsak G. Comparing lagged linear correlation, lagged regression, Granger causality, and vector autoregression for uncovering associations in EHR data. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2017;2016:779-788. [PMID: 28269874 PMCID: PMC5333294] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]

Predictability Bounds of Electronic Health Records. Sci Rep 2015;5:11865. [PMID: 26148751 PMCID: PMC4493571 DOI: 10.1038/srep11865] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2013] [Accepted: 06/04/2015] [Indexed: 01/25/2023] Open

Hripcsak G, Albers DJ, Perotte A. Parameterizing time in electronic health record studies. J Am Med Inform Assoc 2015;22:794-804. [PMID: 25725004 PMCID: PMC6169471 DOI: 10.1093/jamia/ocu051] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2014] [Revised: 11/08/2014] [Accepted: 12/22/2014] [Indexed: 02/07/2023] Open

Hripcsak G. Physics of the Medical Record: Handling Time in Health Record Studies. Artif Intell Med 2015. [DOI: 10.1007/978-3-319-19551-3_1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/22/2023]

Hagar Y, Albers D, Pivovarov R, Chase H, Dukic V, Elhadad N. Survival Analysis with Electronic Health Record Data: Experiments with Chronic Kidney Disease. Stat Anal Data Min 2014;7:385-403. [PMID: 33981381 PMCID: PMC8112603 DOI: 10.1002/sam.11236] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]

Affiliation(s)

Yolanda Hagar Yolanda Hagar is a postdoctoral researcher in applied mathematics at the University of Colorado at Boulder. David Albers is an associate research scientist in biomedical informatics at Columbia University. Rimma Pivovarov is a doctoral candidate in biomedical informatics at Columbia University. Herbert Chase is a professor of clinical medicine in biomedical informatics at Columbia University. Vanja Dukic is an associate professor in applied mathematics at the University of Colorado at Boulder. Noémie Elhadad is an assistant professor in biomedical informatics at Columbia University
David Albers Yolanda Hagar is a postdoctoral researcher in applied mathematics at the University of Colorado at Boulder. David Albers is an associate research scientist in biomedical informatics at Columbia University. Rimma Pivovarov is a doctoral candidate in biomedical informatics at Columbia University. Herbert Chase is a professor of clinical medicine in biomedical informatics at Columbia University. Vanja Dukic is an associate professor in applied mathematics at the University of Colorado at Boulder. Noémie Elhadad is an assistant professor in biomedical informatics at Columbia University
Rimma Pivovarov Yolanda Hagar is a postdoctoral researcher in applied mathematics at the University of Colorado at Boulder. David Albers is an associate research scientist in biomedical informatics at Columbia University. Rimma Pivovarov is a doctoral candidate in biomedical informatics at Columbia University. Herbert Chase is a professor of clinical medicine in biomedical informatics at Columbia University. Vanja Dukic is an associate professor in applied mathematics at the University of Colorado at Boulder. Noémie Elhadad is an assistant professor in biomedical informatics at Columbia University
Herbert Chase Yolanda Hagar is a postdoctoral researcher in applied mathematics at the University of Colorado at Boulder. David Albers is an associate research scientist in biomedical informatics at Columbia University. Rimma Pivovarov is a doctoral candidate in biomedical informatics at Columbia University. Herbert Chase is a professor of clinical medicine in biomedical informatics at Columbia University. Vanja Dukic is an associate professor in applied mathematics at the University of Colorado at Boulder. Noémie Elhadad is an assistant professor in biomedical informatics at Columbia University
Vanja Dukic Yolanda Hagar is a postdoctoral researcher in applied mathematics at the University of Colorado at Boulder. David Albers is an associate research scientist in biomedical informatics at Columbia University. Rimma Pivovarov is a doctoral candidate in biomedical informatics at Columbia University. Herbert Chase is a professor of clinical medicine in biomedical informatics at Columbia University. Vanja Dukic is an associate professor in applied mathematics at the University of Colorado at Boulder. Noémie Elhadad is an assistant professor in biomedical informatics at Columbia University
Noémie Elhadad Yolanda Hagar is a postdoctoral researcher in applied mathematics at the University of Colorado at Boulder. David Albers is an associate research scientist in biomedical informatics at Columbia University. Rimma Pivovarov is a doctoral candidate in biomedical informatics at Columbia University. Herbert Chase is a professor of clinical medicine in biomedical informatics at Columbia University. Vanja Dukic is an associate professor in applied mathematics at the University of Colorado at Boulder. Noémie Elhadad is an assistant professor in biomedical informatics at Columbia University

Collapse

Pivovarov R, Albers DJ, Sepulveda JL, Elhadad N. Identifying and mitigating biases in EHR laboratory tests. J Biomed Inform 2014;51:24-34. [PMID: 24727481 DOI: 10.1016/j.jbi.2014.03.016] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2013] [Revised: 03/27/2014] [Accepted: 03/30/2014] [Indexed: 02/08/2023]

Abstract

Electronic health record (EHR) data show promise for deriving new ways of modeling human disease states. Although EHR researchers often use numerical values of laboratory tests as features in disease models, a great deal of information is contained in the context within which a laboratory test is taken. For example, the same numerical value of a creatinine test has different interpretation for a chronic kidney disease patient and a patient with acute kidney injury. We study whether EHR research studies are subject to biased results and interpretations if laboratory measurements taken in different contexts are not explicitly separated. We show that the context of a laboratory test measurement can often be captured by the way the test is measured through time. We perform three tasks to study the properties of these temporal measurement patterns. In the first task, we confirm that laboratory test measurement patterns provide additional information to the stand-alone numerical value. The second task identifies three measurement pattern motifs across a set of 70 laboratory tests performed for over 14,000 patients. Of these, one motif exhibits properties that can lead to biased research results. In the third task, we demonstrate the potential for biased results on a specific example. We conduct an association study of lipase test values to acute pancreatitis. We observe a diluted signal when using only a lipase value threshold, whereas the full association is recovered when properly accounting for lipase measurements in different contexts (leveraging the lipase measurement patterns to separate the contexts). Aggregating EHR data without separating distinct laboratory test measurement patterns can intermix patients with different diseases, leading to the confounding of signals in large-scale EHR analyses. This paper presents a methodology for leveraging measurement frequency to identify and reduce laboratory test biases.

Collapse

Albers DJ, Hripcsak G, Schmidt M. Population physiology: leveraging electronic health record data to understand human endocrine dynamics. PLoS One 2012;7:e48058. [PMID: 23272040 PMCID: PMC3522687 DOI: 10.1371/journal.pone.0048058] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2011] [Accepted: 09/25/2012] [Indexed: 11/19/2022] Open

Abstract

Studying physiology and pathophysiology over a broad population for long periods of time is difficult primarily because collecting human physiologic data can be intrusive, dangerous, and expensive. One solution is to use data that have been collected for a different purpose. Electronic health record (EHR) data promise to support the development and testing of mechanistic physiologic models on diverse populations and allow correlation with clinical outcomes, but limitations in the data have thus far thwarted such use. For example, using uncontrolled population-scale EHR data to verify the outcome of time dependent behavior of mechanistic, constructive models can be difficult because: (i) aggregation of the population can obscure or generate a signal, (ii) there is often no control population with a well understood health state, and (iii) diversity in how the population is measured can make the data difficult to fit into conventional analysis techniques. This paper shows that it is possible to use EHR data to test a physiological model for a population and over long time scales. Specifically, a methodology is developed and demonstrated for testing a mechanistic, time-dependent, physiological model of serum glucose dynamics with uncontrolled, population-scale, physiological patient data extracted from an EHR repository. It is shown that there is no observable daily variation the normalized mean glucose for any EHR subpopulations. In contrast, a derived value, daily variation in nonlinear correlation quantified by the time-delayed mutual information (TDMI), did reveal the intuitively expected diurnal variation in glucose levels amongst a random population of humans. Moreover, in a population of continuously (tube) fed patients, there was no observable TDMI-based diurnal signal. These TDMI-based signals, via a glucose insulin model, were then connected with human feeding patterns. In particular, a constructive physiological model was shown to correctly predict the difference between the general uncontrolled population and a subpopulation whose feeding was controlled.

Collapse