Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Batal I, Valizadegan H, Cooper GF, Hauskrecht M. A Temporal Pattern Mining Approach for Classifying Electronic Health Record Data. ACM T INTEL SYST TEC 2013;4:10.1145/2508037.2508044. [PMID: 25309815 PMCID: PMC4192602 DOI: 10.1145/2508037.2508044] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2011] [Accepted: 08/01/2013] [Indexed: 10/26/2022]

For:	Batal I, Valizadegan H, Cooper GF, Hauskrecht M. A Temporal Pattern Mining Approach for Classifying Electronic Health Record Data. ACM T INTEL SYST TEC 2013;4:10.1145/2508037.2508044. [PMID: 25309815 PMCID: PMC4192602 DOI: 10.1145/2508037.2508044] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2011] [Accepted: 08/01/2013] [Indexed: 10/26/2022]

Number

Cited by Other Article(s)

Itzhak N, Jaroszewicz S, Moskovitch R. Event prediction by estimating continuously the completion of a single temporal pattern's instances. J Biomed Inform 2024:104665. [PMID: 38852777 DOI: 10.1016/j.jbi.2024.104665] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2023] [Revised: 05/10/2024] [Accepted: 06/03/2024] [Indexed: 06/11/2024]

Bennis FC, Aussems C, Korevaar JC, Hoogendoorn M. The added value of temporal data and the best way to handle it: A use-case for atrial fibrillation using general practitioner data. Comput Biol Med 2024;171:108097. [PMID: 38412689 DOI: 10.1016/j.compbiomed.2024.108097] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Revised: 01/29/2024] [Accepted: 02/01/2024] [Indexed: 02/29/2024]

Yin Y, Chou CA. Multi-event survival analysis through dynamic multi-modal learning for ICU mortality prediction. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2023;235:107545. [PMID: 37062155 DOI: 10.1016/j.cmpb.2023.107545] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Revised: 01/03/2023] [Accepted: 04/08/2023] [Indexed: 05/08/2023]

Abstract

BACKGROUND AND OBJECTIVE

Survival analysis is widely applied for assessing the expected duration of patient status towards event occurrences such as mortality in healthcare domain, which is generally considered as a time-to-event problem. Patients with multiple complications have high mortality risks and oftentimes require specific intensive care and clinical treatments. The progression of complications is time-varying according to disease development and intrinsic interactions between complications with respect to mortality are uncertain. Classical methods for mortality prediction and survival analysis in critical care, such as risk scoring systems and cause-specific survival models, were not designed for this multi-event survival analysis problem and able to measure the competing risks of death for mutually exclusive events. In addition, multivariate temporal information of complications is not taken into consideration while estimating differentiated mortality risks in the early stage.

METHODS

In this paper, we propose a novel multi-event survival analysis solution using a tree-based autoregressive survival model of multi-modal electronic health record data. Specifically, we focus on modeling the temporal trajectory of complications and estimating the mortality risk associated with multiple potential complications simultaneously. In dynamic modeling, no assumptions are made for the relationships between time-dependent variables and risk transition over time.

RESULTS

Validated with the eICU database, our model achieves a better prediction performance with C-index ranging in 74-80%, compared to state-of-the-art machine learning methods in the literature, for the complications of acute respiratory distress syndrome and cardiovascular disease cases.

CONCLUSIONS

Our model provides the distinguishable mortality risk curves over time for specific complications and the track of risk development that could potentially support the ICU resource reallocation.

Collapse

Prediction of acute hypertensive episodes in critically ill patients. Artif Intell Med 2023;139:102525. [PMID: 37100504 DOI: 10.1016/j.artmed.2023.102525] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2022] [Revised: 01/19/2023] [Accepted: 03/06/2023] [Indexed: 03/16/2023]

Abstract

Prevention and treatment of complications are the backbone of medical care, particularly in critical care settings. Early detection and prompt intervention can potentially prevent complications from occurring and improve outcomes. In this study, we use four longitudinal vital signs variables of intensive care unit patients, focusing on predicting acute hypertensive episodes (AHEs). These episodes represent elevations in blood pressure and may result in clinical damage or indicate a change in a patient's clinical situation, such as an elevation in intracranial pressure or kidney failure. Prediction of AHEs may allow clinicians to anticipate changes in the patient's condition and respond early on to prevent these from occurring. Temporal abstraction was employed to transform the multivariate temporal data into a uniform representation of symbolic time intervals, from which frequent time-intervals-related patterns (TIRPs) are mined and used as features for AHE prediction. A novel TIRP metric for classification, called coverage, is introduced that measures the coverage of a TIRP's instances in a time window. For comparison, several baseline models were applied on the raw time series data, including logistic regression and sequential deep learning models, are used. Our results show that using frequent TIRPs as features outperforms the baseline models, and the use of the coverage, metric outperforms other TIRP metrics. Two approaches to predicting AHEs in real-life application conditions are evaluated: using a sliding window to continuously predict whether a patient would experience an AHE within a specific prediction time period ahead, our models produced an AUC-ROC of 82%, but with low AUPRC. Alternatively, predicting whether an AHE would generally occur during the entire admission resulted in an AUC-ROC of 74%.

Collapse

Qiu P, Gong Y, Zhao Y, Cao L, Zhang C, Dong X. An Efficient Method for Modeling Nonoccurring Behaviors by Negative Sequential Patterns With Loose Constraints. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023;34:1864-1878. [PMID: 33729957 DOI: 10.1109/tnnls.2021.3063162] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]

Novitski P, Cohen CM, Karasik A, Hodik G, Moskovitch R. Temporal patterns selection for All-Cause Mortality prediction in T2D with ANNs. J Biomed Inform 2022;134:104198. [PMID: 36100163 DOI: 10.1016/j.jbi.2022.104198] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Revised: 08/10/2022] [Accepted: 09/03/2022] [Indexed: 01/02/2023]

Abstract

Mortality prevention in T2D elderly population having Chronic Kidney Disease (CKD) may be possible thorough risk assessment and predictive modeling. In this study we investigate the ability to predict mortality using heterogeneous Electronic Health Records data. Temporal abstraction is employed to transform the heterogeneous multivariate temporal data into a uniform representation of symbolic time intervals, from which then frequent Time Intervals Related Patterns (TIRPs) are discovered. However, in this study a novel representation of the TIRPs is introduced, which enables to incorporate them in Deep Learning Networks. We describe here the use of iTirps and bTirps, in which the TIRPs are represented by a integer and binary vector representing the time respectively. While bTirp represents whether a TIRP's instance was present, iTirp represents whether multiple instances were present. While the framework showed encouraging results, a major challenge is often the large number of TIRPs, which may cause the models to under-perform. We introduce a novel method for TIRPs' selection method, called TIRP Ranking Criteria (TRC), which is consists on the TIRP's metrics, such as the differences in its recurrences, its frequencies, and the average duration difference between the classes. Additionally, we introduce an advanced version, called TRC Redundant TIRP Removal (TRC-RTR), TIRPs that highly correlate are candidates for removal. Then the selected subset of iTirp/bTirps is fed into a Deep Learning architecture like a Recurrent Neural Network or a Convolutional Neural Network. Furthermore, a predictive committee is utilized in which raw data and iTirp data are both used as input. Our results show that iTirps-based models that use a subset of iTirps based on the TRC-RTR method outperform models that use raw data or models that use full set of discovered iTirps.

Collapse

Bennis FC, Hoogendoorn M, Aussems C, Korevaar JC. Prediction of heart failure 1 year before diagnosis in general practitioner patients using machine learning algorithms: a retrospective case-control study. BMJ Open 2022;12:e060458. [PMID: 36041765 PMCID: PMC9438066 DOI: 10.1136/bmjopen-2021-060458] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open

Abstract

OBJECTIVES

Heart failure (HF) is a commonly occurring health problem with high mortality and morbidity. If potential cases could be detected earlier, it may be possible to intervene earlier, which may slow progression in some patients. Preferably, it is desired to reuse already measured data for screening of all persons in an age group, such as general practitioner (GP) data. Furthermore, it is essential to evaluate the number of people needed to screen to find one patient using true incidence rates, as this indicates the generalisability in the true population. Therefore, we aim to create a machine learning model for the prediction of HF using GP data and evaluate the number needed to screen with true incidence rates.

DESIGN, SETTINGS AND PARTICIPANTS

GP data from 8543 patients (-2 to -1 year before diagnosis) and controls aged 70+ years were obtained retrospectively from 01 January 2012 to 31 December 2019 from the Nivel Primary Care Database. Codes about chronic illness, complaints, diagnostics and medication were obtained. Data were split in a train/test set. Datasets describing demographics, the presence of codes (non-sequential) and upon each other following codes (sequential) were created. Logistic regression, random forest and XGBoost models were trained. Predicted outcome was the presence of HF after 1 year. The ratio case:control in the test set matched true incidence rates (1:45).

RESULTS

Sole demographics performed average (area under the curve (AUC) 0.692, CI 0.677 to 0.706). Adding non-sequential information combined with a logistic regression model performed best and significantly improved performance (AUC 0.772, CI 0.759 to 0.785, p<0.001). Further adding sequential information did not alter performance significantly (AUC 0.767, CI 0.754 to 0.780, p=0.07). The number needed to screen dropped from 14.11 to 5.99 false positives per true positive.

CONCLUSION

This study created a model able to identify patients with pending HF a year before diagnosis.

Collapse

Shitrit G, Tractinsky N, Moskovitch R. Visualization of Frequent Temporal Patterns in Single or Two Populations. J Biomed Inform 2022;134:104169. [PMID: 36038065 DOI: 10.1016/j.jbi.2022.104169] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Revised: 08/11/2022] [Accepted: 08/13/2022] [Indexed: 10/15/2022]

Estiri H, Strasser ZH, Murphy SN. High-throughput phenotyping with temporal sequences. J Am Med Inform Assoc 2021;28:772-781. [PMID: 33313899 DOI: 10.1093/jamia/ocaa288] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2020] [Accepted: 11/04/2020] [Indexed: 12/15/2022] Open

Abstract

OBJECTIVE

High-throughput electronic phenotyping algorithms can accelerate translational research using data from electronic health record (EHR) systems. The temporal information buried in EHRs is often underutilized in developing computational phenotypic definitions. This study aims to develop a high-throughput phenotyping method, leveraging temporal sequential patterns from EHRs.

MATERIALS AND METHODS

We develop a representation mining algorithm to extract 5 classes of representations from EHR diagnosis and medication records: the aggregated vector of the records (aggregated vector representation), the standard sequential patterns (sequential pattern mining), the transitive sequential patterns (transitive sequential pattern mining), and 2 hybrid classes. Using EHR data on 10 phenotypes from the Mass General Brigham Biobank, we train and validate phenotyping algorithms.

RESULTS

Phenotyping with temporal sequences resulted in a superior classification performance across all 10 phenotypes compared with the standard representations in electronic phenotyping. The high-throughput algorithm's classification performance was superior or similar to the performance of previously published electronic phenotyping algorithms. We characterize and evaluate the top transitive sequences of diagnosis records paired with the records of risk factors, symptoms, complications, medications, or vaccinations.

DISCUSSION

The proposed high-throughput phenotyping approach enables seamless discovery of sequential record combinations that may be difficult to assume from raw EHR data. Transitive sequences offer more accurate characterization of the phenotype, compared with its individual components, and reflect the actual lived experiences of the patients with that particular disease.

CONCLUSION

Sequential data representations provide a precise mechanism for incorporating raw EHR records into downstream machine learning. Our approach starts with user interpretability and works backward to the technology.

Collapse

Oei RW, Fang HSA, Tan WY, Hsu W, Lee ML, Tan NC. Using Domain Knowledge and Data-Driven Insights for Patient Similarity Analytics. J Pers Med 2021;11:jpm11080699. [PMID: 34442343 PMCID: PMC8398126 DOI: 10.3390/jpm11080699] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2021] [Revised: 07/15/2021] [Accepted: 07/21/2021] [Indexed: 12/23/2022] Open

Schvetz M, Fuchs L, Novack V, Moskovitch R. Outcomes prediction in longitudinal data: Study designs evaluation, use case in ICU acquired sepsis. J Biomed Inform 2021;117:103734. [PMID: 33711544 DOI: 10.1016/j.jbi.2021.103734] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2020] [Revised: 02/27/2021] [Accepted: 03/01/2021] [Indexed: 12/23/2022]

Abstract

Outcomes' prediction in Electronic Health Records (EHR) and specifically in Critical Care is increasingly attracting more exploration and research. In this study, we used clinical data from the Intensive Care Unit (ICU), focusing on ICU acquired sepsis. Looking at the current literature, several evaluation approaches are reported, inspired by epidemiological designs, in which some do not always reflect real-life application's conditions. This problem seems relevant generally to outcomes' prediction in longitudinal EHR data, or generally longitudinal data, while in this study we focused on ICU data. Unlike in most previous studies that investigated all sepsis admissions, we focused specifically on ICU-Acquired Sepsis. Due to the sparse nature of the longitudinal data, we employed the use of Temporal Abstraction and Time Interval-Related Patterns discovery, which are further used as classification features. Two experiments were designed using three different outcomes prediction study designs from the literature, implementing various levels of real-life conditions to evaluate the prediction models. The first experiment focused on predicting whether a patient would suffer from ICU-acquired sepsis and when during her admission, given a sliding observation time window, and the comparison of the three study designs behavior. The second experiment focused only on predicting whether the patient will suffer from ICU-acquired sepsis, based on data taken relatively to his admission start time. Our results show that using Temporal Discretization for Classification (TD4C) led to better performance than using the Equal-Width Discretization, Knowledge-Based, or SAX. Also, using two states abstraction was better than three or four. Using the default Binary TIRP representation method performed better than Mean Duration, Horizontal Support, and horizontally normalized horizontal support. Using XGBoost as a classifier performed better than Logistic Regression, Neural Net, or Random Forest. Additionally, it is demonstrated why the use of case-crossover-control is most appropriate for real life application conditions evaluation, unlike other incomplete designs that may even result in "better performance".

Collapse

Lee JM, Hauskrecht M. Modeling multivariate clinical event time-series with recurrent temporal mechanisms. Artif Intell Med 2021;112:102021. [PMID: 33581828 PMCID: PMC7943294 DOI: 10.1016/j.artmed.2021.102021] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2020] [Revised: 12/26/2020] [Accepted: 01/10/2021] [Indexed: 12/18/2022]

Jane YN, Nehemiah HK, Kannan A. Classifying unevenly spaced clinical time series data using forecast error approximation based bottom-up (FeAB) segmented time delay neural network. COMPUTER METHODS IN BIOMECHANICS AND BIOMEDICAL ENGINEERING: IMAGING & VISUALIZATION 2021. [DOI: 10.1080/21681163.2020.1817791] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]

Morid MA, Sheng ORL, Kawamoto K, Abdelrahman S. Learning hidden patterns from patient multivariate time series data using convolutional neural networks: A case study of healthcare cost prediction. J Biomed Inform 2020;111:103565. [DOI: 10.1016/j.jbi.2020.103565] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2019] [Revised: 08/27/2020] [Accepted: 09/07/2020] [Indexed: 01/20/2023]

Quantitative and temporal approach to utilising electronic medical records from general practices in mental health prediction. Comput Biol Med 2020;125:103973. [DOI: 10.1016/j.compbiomed.2020.103973] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2020] [Revised: 08/11/2020] [Accepted: 08/11/2020] [Indexed: 01/06/2023]

Pokharel S, Zuccon G, Li X, Utomo CP, Li Y. Temporal tree representation for similarity computation between medical patients. Artif Intell Med 2020;108:101900. [DOI: 10.1016/j.artmed.2020.101900] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2019] [Revised: 05/15/2020] [Accepted: 06/03/2020] [Indexed: 02/01/2023]

Dagliati A, Geifman N, Peek N, Holmes JH, Sacchi L, Bellazzi R, Sajjadi SE, Tucker A. Using topological data analysis and pseudo time series to infer temporal phenotypes from electronic health records. Artif Intell Med 2020;108:101930. [PMID: 32972659 PMCID: PMC7536308 DOI: 10.1016/j.artmed.2020.101930] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2020] [Revised: 05/21/2020] [Accepted: 07/11/2020] [Indexed: 11/17/2022]

Estiri H, Strasser ZH, Klann JG, McCoy TH, Wagholikar KB, Vasey S, Castro VM, Murphy ME, Murphy SN. Transitive Sequencing Medical Records for Mining Predictive and Interpretable Temporal Representations. PATTERNS (NEW YORK, N.Y.) 2020;1:100051. [PMID: 32835307 PMCID: PMC7301790 DOI: 10.1016/j.patter.2020.100051] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/13/2020] [Revised: 04/27/2020] [Accepted: 05/26/2020] [Indexed: 12/13/2022]

Affiliation(s)

Hossein Estiri Laboratory of Computer Science, Massachusetts General Hospital, Boston, MA 02144, USA Research Information Science and Computing, Mass General Brigham, Somerville, MA 02145, USA Harvard Medical School, Boston, MA 02115, USA
Zachary H. Strasser Laboratory of Computer Science, Massachusetts General Hospital, Boston, MA 02144, USA Research Information Science and Computing, Mass General Brigham, Somerville, MA 02145, USA Harvard Medical School, Boston, MA 02115, USA Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
Jeffery G. Klann Laboratory of Computer Science, Massachusetts General Hospital, Boston, MA 02144, USA Research Information Science and Computing, Mass General Brigham, Somerville, MA 02145, USA Harvard Medical School, Boston, MA 02115, USA
Thomas H. McCoy Harvard Medical School, Boston, MA 02115, USA Center for Quantitative Health, Massachusetts General Hospital, Boston, MA 02114, USA
Kavishwar B. Wagholikar Laboratory of Computer Science, Massachusetts General Hospital, Boston, MA 02144, USA Research Information Science and Computing, Mass General Brigham, Somerville, MA 02145, USA Harvard Medical School, Boston, MA 02115, USA
Sebastien Vasey Department of Mathematics, Harvard University, Cambridge, MA 02138, USA
Victor M. Castro Research Information Science and Computing, Mass General Brigham, Somerville, MA 02145, USA
MaryKate E. Murphy Research Information Science and Computing, Mass General Brigham, Somerville, MA 02145, USA
Shawn N. Murphy Laboratory of Computer Science, Massachusetts General Hospital, Boston, MA 02144, USA Research Information Science and Computing, Mass General Brigham, Somerville, MA 02145, USA Harvard Medical School, Boston, MA 02115, USA Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA Department of Neurology, Massachusetts General Hospital, Boston, MA 02114, USA

Collapse

Morid MA, Sheng ORL, Del Fiol G, Facelli JC, Bray BE, Abdelrahman S. Temporal Pattern Detection to Predict Adverse Events in Critical Care: Case Study With Acute Kidney Injury. JMIR Med Inform 2020;8:e14272. [PMID: 32181753 PMCID: PMC7109618 DOI: 10.2196/14272] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2019] [Revised: 11/23/2019] [Accepted: 01/22/2020] [Indexed: 12/21/2022] Open

Abstract

BACKGROUND

More than 20% of patients admitted to the intensive care unit (ICU) develop an adverse event (AE). No previous study has leveraged patients' data to extract the temporal features using their structural temporal patterns, that is, trends.

OBJECTIVE

This study aimed to improve AE prediction methods by using structural temporal pattern detection that captures global and local temporal trends and to demonstrate these improvements in the detection of acute kidney injury (AKI).

METHODS

Using the Medical Information Mart for Intensive Care dataset, containing 22,542 patients, we extracted both global and local trends using structural pattern detection methods to predict AKI (ie, binary prediction). Classifiers were built on 17 input features consisting of vital signs and laboratory test results using state-of-the-art models; the optimal classifier was selected for comparisons with previous approaches. The classifier with structural pattern detection features was compared with two baseline classifiers that used different temporal feature extraction approaches commonly used in the literature: (1) symbolic temporal pattern detection, which is the most common approach for multivariate time series classification; and (2) the last recorded value before the prediction point, which is the most common approach to extract temporal data in the AKI prediction literature. Moreover, we assessed the individual contribution of global and local trends. Classifier performance was measured in terms of accuracy (primary outcome), area under the curve, and F-measure. For all experiments, we employed 20-fold cross-validation.

RESULTS

Random forest was the best classifier using structural temporal pattern detection. The accuracy of the classifier with local and global trend features was significantly higher than that while using symbolic temporal pattern detection and the last recorded value (81.3% vs 70.6% vs 58.1%; P<.001). Excluding local or global features reduced the accuracy to 74.4% or 78.1%, respectively (P<.001).

CONCLUSIONS

Classifiers using features obtained from structural temporal pattern detection significantly improved the prediction of AKI onset in ICU patients over two baselines based on common previous approaches. The proposed method is a generalizable approach to predict AEs in critical care that may be used to help clinicians intervene in a timely manner to prevent or mitigate AEs.

Collapse

Estiri H, Vasey S, Murphy SN. Transitive Sequential Pattern Mining for Discrete Clinical Data. Artif Intell Med 2020. [DOI: 10.1007/978-3-030-59137-3_37] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]

Lee EW, Ho JC. FuzzyGap: Sequential Pattern Mining for Predicting Chronic Heart Failure in Clinical Pathways. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2019;2019:222-231. [PMID: 31258974 PMCID: PMC6568087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]

Luo G. A roadmap for semi-automatically extracting predictive and clinically meaningful temporal features from medical data for predictive modeling. GLOBAL TRANSITIONS 2019;1:61-82. [PMID: 31032483 PMCID: PMC6482973 DOI: 10.1016/j.glt.2018.11.001] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]

Georga EI, Tachos NS, Sakellarios AI, Kigka VI, Exarchos TP, Pelosi G, Parodi O, Michalis LK, Fotiadis DI. Artificial Intelligence and Data Mining Methods for Cardiovascular Risk Prediction. ACTA ACUST UNITED AC 2019. [DOI: 10.1007/978-981-10-5092-3_14] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]

Despins LA, Kim JH, Deroche C, Song X. Factors Influencing How Intensive Care Unit Nurses Allocate Their Time. West J Nurs Res 2019;41:1551-1575. [DOI: 10.1177/0193945918824070] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]

Dagliati A, Geifman N, Peek N, Holmes JH, Sacchi L, Sajjadi SE, Tucker A. Inferring Temporal Phenotypes with Topological Data Analysis and Pseudo Time-Series. Artif Intell Med 2019. [DOI: 10.1007/978-3-030-21642-9_50] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]

Selby PJ, Banks RE, Gregory W, Hewison J, Rosenberg W, Altman DG, Deeks JJ, McCabe C, Parkes J, Sturgeon C, Thompson D, Twiddy M, Bestall J, Bedlington J, Hale T, Dinnes J, Jones M, Lewington A, Messenger MP, Napp V, Sitch A, Tanwar S, Vasudev NS, Baxter P, Bell S, Cairns DA, Calder N, Corrigan N, Del Galdo F, Heudtlass P, Hornigold N, Hulme C, Hutchinson M, Lippiatt C, Livingstone T, Longo R, Potton M, Roberts S, Sim S, Trainor S, Welberry Smith M, Neuberger J, Thorburn D, Richardson P, Christie J, Sheerin N, McKane W, Gibbs P, Edwards A, Soomro N, Adeyoju A, Stewart GD, Hrouda D. Methods for the evaluation of biomarkers in patients with kidney and liver diseases: multicentre research programme including ELUCIDATE RCT. PROGRAMME GRANTS FOR APPLIED RESEARCH 2018. [DOI: 10.3310/pgfar06030] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]

Abstract BackgroundProtein biomarkers with associations with the activity and outcomes of diseases are being identified by modern proteomic technologies. They may be simple, accessible, cheap and safe tests that can inform diagnosis, prognosis, treatment selection, monitoring of disease activity and therapy and may substitute for complex, invasive and expensive tests. However, their potential is not yet being realised.Design and methodsThe study consisted of three workstreams to create a framework for research: workstream 1, methodology – to define current practice and explore methodology innovations for biomarkers for monitoring disease; workstream 2, clinical translation – to create a framework of research practice, high-quality samples and related clinical data to evaluate the validity and clinical utility of protein biomarkers; and workstream 3, the ELF to Uncover Cirrhosis as an Indication for Diagnosis and Action for Treatable Event (ELUCIDATE) randomised controlled trial (RCT) – an exemplar RCT of an established test, the ADVIA Centaur® Enhanced Liver Fibrosis (ELF) test (Siemens Healthcare Diagnostics Ltd, Camberley, UK) [consisting of a panel of three markers – (1) serum hyaluronic acid, (2) amino-terminal propeptide of type III procollagen and (3) tissue inhibitor of metalloproteinase 1], for liver cirrhosis to determine its impact on diagnostic timing and the management of cirrhosis and the process of care and improving outcomes.ResultsThe methodology workstream evaluated the quality of recommendations for using prostate-specific antigen to monitor patients, systematically reviewed RCTs of monitoring strategies and reviewed the monitoring biomarker literature and how monitoring can have an impact on outcomes. Simulation studies were conducted to evaluate monitoring and improve the merits of health care. The monitoring biomarker literature is modest and robust conclusions are infrequent. We recommend improvements in research practice. Patients strongly endorsed the need for robust and conclusive research in this area. The clinical translation workstream focused on analytical and clinical validity. Cohorts were established for renal cell carcinoma (RCC) and renal transplantation (RT), with samples and patient data from multiple centres, as a rapid-access resource to evaluate the validity of biomarkers. Candidate biomarkers for RCC and RT were identified from the literature and their quality was evaluated and selected biomarkers were prioritised. The duration of follow-up was a limitation but biomarkers were identified that may be taken forward for clinical utility. In the third workstream, the ELUCIDATE trial registered 1303 patients and randomised 878 patients out of a target of 1000. The trial started late and recruited slowly initially but ultimately recruited with good statistical power to answer the key questions. ELF monitoring altered the patient process of care and may show benefits from the early introduction of interventions with further follow-up. The ELUCIDATE trial was an ‘exemplar’ trial that has demonstrated the challenges of evaluating biomarker strategies in ‘end-to-end’ RCTs and will inform future study designs.ConclusionsThe limitations in the programme were principally that, during the collection and curation of the cohorts of patients with RCC and RT, the pace of discovery of new biomarkers in commercial and non-commercial research was slower than anticipated and so conclusive evaluations using the cohorts are few; however, access to the cohorts will be sustained for future new biomarkers. The ELUCIDATE trial was slow to start and recruit to, with a late surge of recruitment, and so final conclusions about the impact of the ELF test on long-term outcomes await further follow-up. The findings from the three workstreams were used to synthesise a strategy and framework for future biomarker evaluations incorporating innovations in study design, health economics and health informatics.Trial registrationCurrent Controlled Trials ISRCTN74815110, UKCRN ID 9954 and UKCRN ID 11930.FundingThis project was funded by the NIHR Programme Grants for Applied Research programme and will be published in full inProgramme Grants for Applied Research; Vol. 6, No. 3. See the NIHR Journals Library website for further project information. Collapse

Affiliation(s)

Peter J Selby Clinical and Biomedical Proteomics Group, Leeds Institute of Cancer and Pathology, University of Leeds, Leeds, UK Leeds Teaching Hospitals NHS Trust, Leeds, UK
Rosamonde E Banks Clinical and Biomedical Proteomics Group, Leeds Institute of Cancer and Pathology, University of Leeds, Leeds, UK
Walter Gregory Leeds Institute of Clinical Trials Research, University of Leeds, Leeds, UK
Jenny Hewison Leeds Institute of Health Sciences, University of Leeds, Leeds, UK
William Rosenberg Institute for Liver and Digestive Health, Division of Medicine, University College London, London, UK
Douglas G Altman Centre for Statistics in Medicine, University of Oxford, Oxford, UK
Jonathan J Deeks Institute of Applied Health Research, University of Birmingham, Birmingham, UK
Christopher McCabe Department of Emergency Medicine, University of Alberta Hospital, Edmonton, AB, Canada
Julie Parkes Primary Care and Population Sciences Academic Unit, University of Southampton, Southampton, UK
Catharine Sturgeon Royal Infirmary of Edinburgh, Edinburgh, UK
Douglas Thompson Leeds Teaching Hospitals NHS Trust, Leeds, UK
Maureen Twiddy Leeds Institute of Health Sciences, University of Leeds, Leeds, UK
Janine Bestall Leeds Institute of Health Sciences, University of Leeds, Leeds, UK
Joan Bedlington LIVErNORTH Liver Patient Support, Newcastle upon Tyne, UK
Tilly Hale LIVErNORTH Liver Patient Support, Newcastle upon Tyne, UK
Jacqueline Dinnes Institute of Applied Health Research, University of Birmingham, Birmingham, UK
Marc Jones Leeds Institute of Clinical Trials Research, University of Leeds, Leeds, UK
Andrew Lewington Leeds Teaching Hospitals NHS Trust, Leeds, UK
Michael P Messenger Leeds Teaching Hospitals NHS Trust, Leeds, UK
Vicky Napp Leeds Institute of Clinical Trials Research, University of Leeds, Leeds, UK
Alice Sitch Institute of Applied Health Research, University of Birmingham, Birmingham, UK
Sudeep Tanwar Institute for Liver and Digestive Health, Division of Medicine, University College London, London, UK
Naveen S Vasudev Clinical and Biomedical Proteomics Group, Leeds Institute of Cancer and Pathology, University of Leeds, Leeds, UK Leeds Teaching Hospitals NHS Trust, Leeds, UK
Paul Baxter Leeds Institute of Cardiovascular and Metabolic Medicine, University of Leeds, Leeds, UK
Sue Bell Leeds Institute of Clinical Trials Research, University of Leeds, Leeds, UK
David A Cairns Clinical and Biomedical Proteomics Group, Leeds Institute of Cancer and Pathology, University of Leeds, Leeds, UK
Nicola Calder Leeds Teaching Hospitals NHS Trust, Leeds, UK
Neil Corrigan Leeds Institute of Clinical Trials Research, University of Leeds, Leeds, UK
Francesco Del Galdo Leeds Institute of Rheumatic and Musculoskeletal Medicine, University of Leeds, Leeds, UK
Peter Heudtlass Leeds Institute of Clinical Trials Research, University of Leeds, Leeds, UK
Nick Hornigold Clinical and Biomedical Proteomics Group, Leeds Institute of Cancer and Pathology, University of Leeds, Leeds, UK
Claire Hulme Leeds Institute of Health Sciences, University of Leeds, Leeds, UK
Michelle Hutchinson Clinical and Biomedical Proteomics Group, Leeds Institute of Cancer and Pathology, University of Leeds, Leeds, UK
Carys Lippiatt Department of Specialist Laboratory Medicine, Leeds Teaching Hospitals NHS Trust, Leeds, UK
Tobias Livingstone Leeds Teaching Hospitals NHS Trust, Leeds, UK
Roberta Longo Leeds Institute of Health Sciences, University of Leeds, Leeds, UK
Matthew Potton Leeds Institute of Clinical Trials Research, University of Leeds, Leeds, UK
Stephanie Roberts Clinical and Biomedical Proteomics Group, Leeds Institute of Cancer and Pathology, University of Leeds, Leeds, UK
Sheryl Sim Clinical and Biomedical Proteomics Group, Leeds Institute of Cancer and Pathology, University of Leeds, Leeds, UK
Sebastian Trainor Clinical and Biomedical Proteomics Group, Leeds Institute of Cancer and Pathology, University of Leeds, Leeds, UK
Matthew Welberry Smith Clinical and Biomedical Proteomics Group, Leeds Institute of Cancer and Pathology, University of Leeds, Leeds, UK Leeds Teaching Hospitals NHS Trust, Leeds, UK
James Neuberger University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK
Douglas Thorburn Royal Free London NHS Foundation Trust, London, UK
Paul Richardson Royal Liverpool and Broadgreen University Hospitals NHS Trust, Liverpool, UK
John Christie Royal Devon and Exeter NHS Foundation Trust, Exeter, UK
Neil Sheerin Newcastle upon Tyne Hospitals NHS Foundation Trust, Newcastle upon Tyne, UK
William McKane Sheffield Teaching Hospitals NHS Foundation Trust, Sheffield, UK
Paul Gibbs Portsmouth Hospitals NHS Trust, Portsmouth, UK
Anusha Edwards North Bristol NHS Trust, Bristol, UK
Naeem Soomro Newcastle upon Tyne Hospitals NHS Foundation Trust, Newcastle upon Tyne, UK
Adebanji Adeyoju Stockport NHS Foundation Trust, Stockport, UK
Grant D Stewart NHS Lothian, Edinburgh, UK Academic Urology Group, University of Cambridge, Cambridge, UK
David Hrouda Charing Cross Hospital, Imperial College Healthcare NHS Trust, London, UK

Collapse

Incorporating repeating temporal association rules in Naïve Bayes classifiers for coronary heart disease diagnosis. J Biomed Inform 2018;81:74-82. [PMID: 29555443 DOI: 10.1016/j.jbi.2018.03.002] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2017] [Revised: 02/14/2018] [Accepted: 03/07/2018] [Indexed: 01/08/2023]

Abstract

In this paper, we develop a Naïve Bayes classification model integrated with temporal association rules (TARs). A temporal pattern mining algorithm is used to detect TARs by identifying the most frequent temporal relationships among the derived basic temporal abstractions (TA). We develop and compare three classifiers that use as features the most frequent TARs as follows: (i) representing the most frequent TARs detected within the target class ('Disease = Present'), (ii) representing the most frequent TARs from both classes ('Disease = Present', 'Disease = Absent'), (iii) representing the most frequent TARs, after removing the ones that are low-risk predictors for the disease. These classifiers incorporate the horizontal support of TARs, which defines the number of times that a particular temporal pattern is found in some patient's record, as their features. All of the developed classifiers are applied for diagnosis of coronary heart disease (CHD) using a longitudinal dataset. We compare two ways of feature representation, using horizontal support or the mean duration of each TAR, on a single patient. The results obtained from this comparison show that the horizontal support representation outperforms the mean duration. The main effort of our research is to demonstrate that where long time periods are of significance in some medical domain, such as the CHD domain, the detection of the repeated occurrences of the most frequent TARs can yield better performances. We compared the classifier that uses the horizontal support representation and has the best performance with a Baseline Classifier which uses the binary representation of the most frequent TARs. The results obtained illustrate the comparatively high performance of the classifier representing the horizontal support, over the Baseline Classifier.

Collapse

Liu L, Wang S, Su G, Hu B, Peng Y, Xiong Q, Wen J. A framework of mining semantic-based probabilistic event relations for complex activity recognition. Inf Sci (N Y) 2017. [DOI: 10.1016/j.ins.2017.07.022] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]

Shknevsky A, Shahar Y, Moskovitch R. Consistent discovery of frequent interval-based temporal patterns in chronic patients' data. J Biomed Inform 2017;75:83-95. [PMID: 28987378 DOI: 10.1016/j.jbi.2017.10.002] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2017] [Revised: 08/23/2017] [Accepted: 10/02/2017] [Indexed: 11/24/2022]

Abstract

Increasingly, frequent temporal patterns discovered in longitudinal patient records are proposed as features for classification and prediction, and as means to cluster patient clinical trajectories. However, to justify that, we must demonstrate that most frequent temporal patterns are indeed consistently discoverable within the records of different patient subsets within similar patient populations. We have developed several measures for the consistency of the discovery of temporal patterns. We focus on time-interval relations patterns (TIRPs) that can be discovered within different subsets of the same patient population. We expect the discovered TIRPs (1) to be frequent in each subset, (2) preserve their "local" metrics - the absolute frequency of each pattern, measured by a Proportion Test, and (3) preserve their "global" characteristics - their overall distribution, measured by a Kolmogorov-Smirnov test. We also wanted to examine the effect on consistency, over a variety of settings, of varying the minimal frequency threshold for TIRP discovery, and of using a TIRP-filtering criterion that we previously introduced, the Semantic Adjacency Criterion (SAC). We applied our methodology to three medical domains (oncology, infectious hepatitis, and diabetes). We found that, within the minimal frequency ranges we had examined, 70-95% of the discovered TIRPs were consistently discoverable; 40-48% of them maintained their local frequency. TIRP global distribution similarity varied widely, from 0% to 65%. Increasing the threshold usually increased the percentage of TIRPs that were repeatedly discovered across different patient subsets within the same domain, and the probability of a similar TIRP distribution. Using the SAC principle, enhanced, for most minimal support levels, the percentage of repeating TIRPs, their local consistency and their global consistency. The effect of using the SAC was further strengthened as the minimal frequency threshold was raised.

Collapse

Moskovitch R, Polubriaginof F, Weiss A, Ryan P, Tatonetti N. Procedure prediction from symbolic Electronic Health Records via time intervals analytics. J Biomed Inform 2017;75:70-82. [PMID: 28823923 DOI: 10.1016/j.jbi.2017.07.018] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2016] [Revised: 06/19/2017] [Accepted: 07/25/2017] [Indexed: 11/18/2022]

Abstract

Prediction of medical events, such as clinical procedures, is essential for preventing disease, understanding disease mechanism, and increasing patient quality of care. Although longitudinal clinical data from Electronic Health Records provides opportunities to develop predictive models, the use of these data faces significant challenges. Primarily, while the data are longitudinal and represent thousands of conceptual events having duration, they are also sparse, complicating the application of traditional analysis approaches. Furthermore, the framework presented here takes advantage of the events duration and gaps. International standards for electronic healthcare data represent data elements, such as procedures, conditions, and drug exposures, using eras, or time intervals. Such eras contain both an event and a duration and enable the application of time intervals mining - a relatively new subfield of data mining. In this study, we present Maitreya, a framework for time intervals analytics in longitudinal clinical data. Maitreya discovers frequent time intervals related patterns (TIRPs), which we use as prognostic markers for modelling clinical events. We introduce three novel TIRP metrics that are normalized versions of the horizontal-support, that represents the number of TIRP instances per patient. We evaluate Maitreya on 28 frequent and clinically important procedures, using the three novel TIRP representation metrics in comparison to no temporal representation and previous TIRPs metrics. We also evaluate the epsilon value that makes Allen's relations more flexible with several settings of 30, 60, 90 and 180days in comparison to the default zero. For twenty-two of these procedures, the use of temporal patterns as predictors was superior to non-temporal features, and the use of the vertically normalized horizontal support metric to represent TIRPs as features was most effective. The use of the epsilon value with thirty days was slightly better than the zero.

Collapse

Ghosh S. Predicting short-term ICU outcomes using a sequential contrast motif based classification framework. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2017;2016:5612-5615. [PMID: 28269527 DOI: 10.1109/embc.2016.7591999] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

A temporal model in Electronic Health Record search. Knowl Based Syst 2017. [DOI: 10.1016/j.knosys.2017.03.029] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]

Kostakis O, Papapetrou P. On searching and indexing sequences of temporal intervals. Data Min Knowl Discov 2017. [DOI: 10.1007/s10618-016-0489-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]

Organizing standardized electronic healthcare records data for mining. HEALTH POLICY AND TECHNOLOGY 2016. [DOI: 10.1016/j.hlpt.2016.03.006] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]

Sacchi L, Holmes JH. Progress in Biomedical Knowledge Discovery: A 25-year Retrospective. Yearb Med Inform 2016;Suppl 1:S117-29. [PMID: 27488403 PMCID: PMC5171499 DOI: 10.15265/iys-2016-s033] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open

Abstract

OBJECTIVES

We sought to explore, via a systematic review of the literature, the state of the art of knowledge discovery in biomedical databases as it existed in 1992, and then now, 25 years later, mainly focused on supervised learning.

METHODS

We performed a rigorous systematic search of PubMed and latent Dirichlet allocation to identify themes in the literature and trends in the science of knowledge discovery in and between time periods and compare these trends. We restricted the result set using a bracket of five years previous, such that the 1992 result set was restricted to articles published between 1987 and 1992, and the 2015 set between 2011 and 2015. This was to reflect the current literature available at the time to researchers and others at the target dates of 1992 and 2015. The search term was framed as: Knowledge Discovery OR Data Mining OR Pattern Discovery OR Pattern Recognition, Automated.

RESULTS

A total 538 and 18,172 documents were retrieved for 1992 and 2015, respectively. The number and type of data sources increased dramatically over the observation period, primarily due to the advent of electronic clinical systems. The period 1992- 2015 saw the emergence of new areas of research in knowledge discovery, and the refinement and application of machine learning approaches that were nascent or unknown in 1992.

CONCLUSIONS

Over the 25 years of the observation period, we identified numerous developments that impacted the science of knowledge discovery, including the availability of new forms of data, new machine learning algorithms, and new application domains. Through a bibliometric analysis we examine the striking changes in the availability of highly heterogeneous data resources, the evolution of new algorithmic approaches to knowledge discovery, and we consider from legal, social, and political perspectives possible explanations of the growth of the field. Finally, we reflect on the achievements of the past 25 years to consider what the next 25 years will bring with regard to the availability of even more complex data and to the methods that could be, and are being now developed for the discovery of new knowledge in biomedical data.

Collapse

Li C, Rana S, Phung D, Venkatesh S. Hierarchical Bayesian nonparametric models for knowledge discovery from electronic medical records. Knowl Based Syst 2016. [DOI: 10.1016/j.knosys.2016.02.005] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

Ouyang L, Apley DW, Mehrotra S. A design of experiments approach to validation sampling for logistic regression modeling with error-prone medical records. J Am Med Inform Assoc 2016;23:e71-8. [PMID: 26374705 PMCID: PMC4954627 DOI: 10.1093/jamia/ocv132] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2015] [Revised: 07/16/2015] [Accepted: 07/17/2015] [Indexed: 01/18/2023] Open

Abstract

BACKGROUND AND OBJECTIVE

Electronic medical record (EMR) databases offer significant potential for developing clinical hypotheses and identifying disease risk associations by fitting statistical models that capture the relationship between a binary response variable and a set of predictor variables that represent clinical, phenotypical, and demographic data for the patient. However, EMR response data may be error prone for a variety of reasons. Performing a manual chart review to validate data accuracy is time consuming, which limits the number of chart reviews in a large database. The authors' objective is to develop a new design-of-experiments-based systematic chart validation and review (DSCVR) approach that is more powerful than the random validation sampling used in existing approaches.

METHODS

The DSCVR approach judiciously and efficiently selects the cases to validate (i.e., validate whether the response values are correct for those cases) for maximum information content, based only on their predictor variable values. The final predictive model will be fit using only the validation sample, ignoring the remainder of the unvalidated and unreliable error-prone data. A Fisher information based D-optimality criterion is used, and an algorithm for optimizing it is developed.

RESULTS

The authors' method is tested in a simulation comparison that is based on a sudden cardiac arrest case study with 23 041 patients' records. This DSCVR approach, using the Fisher information based D-optimality criterion, results in a fitted model with much better predictive performance, as measured by the receiver operating characteristic curve and the accuracy in predicting whether a patient will experience the event, than a model fitted using a random validation sample.

CONCLUSIONS

The simulation comparisons demonstrate that this DSCVR approach can produce predictive models that are significantly better than those produced from random validation sampling, especially when the event rate is low.

Collapse

Hoogendoorn M, Szolovits P, Moons LMG, Numans ME. Utilizing uncoded consultation notes from electronic medical records for predictive modeling of colorectal cancer. Artif Intell Med 2016;69:53-61. [PMID: 27085847 DOI: 10.1016/j.artmed.2016.03.003] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2015] [Accepted: 03/23/2016] [Indexed: 12/15/2022]

Abstract

OBJECTIVE

Machine learning techniques can be used to extract predictive models for diseases from electronic medical records (EMRs). However, the nature of EMRs makes it difficult to apply off-the-shelf machine learning techniques while still exploiting the rich content of the EMRs. In this paper, we explore the usage of a range of natural language processing (NLP) techniques to extract valuable predictors from uncoded consultation notes and study whether they can help to improve predictive performance.

METHODS

We study a number of existing techniques for the extraction of predictors from the consultation notes, namely a bag of words based approach and topic modeling. In addition, we develop a dedicated technique to match the uncoded consultation notes with a medical ontology. We apply these techniques as an extension to an existing pipeline to extract predictors from EMRs. We evaluate them in the context of predictive modeling for colorectal cancer (CRC), a disease known to be difficult to diagnose before performing an endoscopy.

RESULTS

Our results show that we are able to extract useful information from the consultation notes. The predictive performance of the ontology-based extraction method moves significantly beyond the benchmark of age and gender alone (area under the receiver operating characteristic curve (AUC) of 0.870 versus 0.831). We also observe more accurate predictive models by adding features derived from processing the consultation notes compared to solely using coded data (AUC of 0.896 versus 0.882) although the difference is not significant. The extracted features from the notes are shown be equally predictive (i.e. there is no significant difference in performance) compared to the coded data of the consultations.

CONCLUSION

It is possible to extract useful predictors from uncoded consultation notes that improve predictive performance. Techniques linking text to concepts in medical ontologies to derive these predictors are shown to perform best for predicting CRC in our EMR dataset.

Collapse

Analyzing health insurance claims on different timescales to predict days in hospital. J Biomed Inform 2016;60:187-96. [PMID: 26827621 DOI: 10.1016/j.jbi.2016.01.002] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2015] [Revised: 01/05/2016] [Accepted: 01/05/2016] [Indexed: 11/21/2022]

Jane NY, Nehemiah KH, Arputharaj K. A Temporal Mining Framework for Classifying Un-Evenly Spaced Clinical Data: An Approach for Building Effective Clinical Decision-Making System. Appl Clin Inform 2016;7:1-21. [PMID: 27081403 DOI: 10.4338/aci-2015-08-ra-0102] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2015] [Accepted: 11/08/2015] [Indexed: 11/23/2022] Open

Batal I, Cooper G, Fradkin D, Harrison J, Moerchen F, Hauskrecht M. An Efficient Pattern Mining Approach for Event Detection in Multivariate Temporal Data. Knowl Inf Syst 2016;46:115-150. [PMID: 26752800 PMCID: PMC4704806 DOI: 10.1007/s10115-015-0819-6] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2013] [Revised: 08/31/2014] [Accepted: 12/06/2014] [Indexed: 11/27/2022]

Digging deep into weighted patient data through multiple-level patterns. Inf Sci (N Y) 2015. [DOI: 10.1016/j.ins.2015.06.006] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]

Ullah MZ, Aono M, Seddiqui MH. Estimating a Ranked List of Human Genetic Diseases by Associating Phenotype-Gene with Gene-Disease Bipartite Graphs. ACM T INTEL SYST TEC 2015. [DOI: 10.1145/2700487] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]

Abstract With vast amounts of medical knowledge available on the Internet, it is becoming increasingly practical to help doctors in clinical diagnostics by suggesting plausible diseases predicted by applying data and text mining technologies. Recently, Genome-Wide Association Studies ( GWAS ) have proved useful as a method for exploring phenotypic associations with diseases. However, since genetic diseases are difficult to diagnose because of their low prevalence, large number, and broad diversity of symptoms, genetic disease patients are often misdiagnosed or experience long diagnostic delays. In this article, we propose a method for ranking genetic diseases for a set of clinical phenotypes. In this regard, we associate a phenotype-gene bipartite graph ( PGBG ) with a gene-disease bipartite graph ( GDBG ) by producing a phenotype-disease bipartite graph ( PDBG ), and we estimate the candidate weights of diseases. In our approach, all paths from a phenotype to a disease are explored by considering causative genes to assign a weight based on path frequency, and the phenotype is linked to the disease in a new PDBG. We introduce the Bidirectionally induced Importance Weight ( BIW ) prediction method to PDBG for approximating the weights of the edges of diseases with phenotypes by considering link information from both sides of the bipartite graph. The performance of our system is compared to that of other known related systems by estimating Normalized Discounted Cumulative Gain ( NDCG ), Mean Average Precision ( MAP ), and Kendall’s tau metrics. Further experiments are conducted with well-known TF · IDF , BM25 , and Jenson-Shannon divergence as baselines. The result shows that our proposed method outperforms the known related tool Phenomizer in terms of NDCG@10, NDCG@20, MAP@10, and MAP@20; however, it performs worse than Phenomizer in terms of Kendall’s tau-b metric at the top-10 ranks. It also turns out that our proposed method has overall better performance than the baseline methods. Collapse

Antonelli D, Baralis E, Bruno G, Cagliero L, Cerquitelli T, Chiusano S, Garza P, Mahoto NA. MeTA. ACM T INTEL SYST TEC 2015. [DOI: 10.1145/2700479] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]

Liu Z, Hauskrecht M. A Regularized Linear Dynamical System Framework for Multivariate Time Series Analysis. PROCEEDINGS OF THE ... AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE. AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE 2015;2015:1798-1804. [PMID: 25905027 PMCID: PMC4402162] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/17/2023]

Classification-driven temporal discretization of multivariate time series. Data Min Knowl Discov 2014. [DOI: 10.1007/s10618-014-0380-z] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]

Classification of multivariate time series via temporal abstraction and time intervals mining. Knowl Inf Syst 2014. [DOI: 10.1007/s10115-014-0784-5] [Citation(s) in RCA: 67] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]