1
|
An individualized Bayesian method for estimating genomic variants of hypertension. BMC Genomics 2023; 23:863. [PMID: 37936055 PMCID: PMC10631115 DOI: 10.1186/s12864-023-09757-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Accepted: 10/19/2023] [Indexed: 11/09/2023] Open
Abstract
BACKGROUND Genomic variants of the disease are often discovered nowadays through population-based genome-wide association studies (GWAS). Identifying genomic variations potentially underlying a phenotype, such as hypertension, in an individual is important for designing personalized treatment; however, population-level models, such as GWAS, may not capture all the important, individualized factors well. In addition, GWAS typically requires a large sample size to detect the association of low-frequency genomic variants with sufficient power. Here, we report an individualized Bayesian inference (IBI) algorithm for estimating the genomic variants that influence complex traits, such as hypertension, at the level of an individual (e.g., a patient). By modeling at the level of the individual, IBI seeks to find genomic variants observed in the individual's genome that provide a strong explanation of the phenotype observed in this individual. RESULTS We applied the IBI algorithm to the data from the Framingham Heart Study to explore the genomic influences of hypertension. Among the top-ranking variants identified by IBI and GWAS, there is a significant number of shared variants (intersection); the unique variants identified only by IBI tend to have relatively lower minor allele frequency than those identified by GWAS. In addition, IBI discovered more individualized and diverse variants that explain hypertension patients better than GWAS. Furthermore, IBI found several well-known low-frequency variants as well as genes related to blood pressure that GWAS missed in the same cohort. Finally, IBI identified top-ranked variants that predicted hypertension better than GWAS, according to the area under the ROC curve. CONCLUSIONS The results support IBI as a promising approach for complementing GWAS, especially in detecting low-frequency genomic variants as well as learning personalized genomic variants of clinical traits and disease, such as the complex trait of hypertension, to help advance precision medicine.
Collapse
|
2
|
A voice-based digital assistant for intelligent prompting of evidence-based practices during ICU rounds. J Biomed Inform 2023; 146:104483. [PMID: 37657712 PMCID: PMC10591951 DOI: 10.1016/j.jbi.2023.104483] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Revised: 07/21/2023] [Accepted: 08/29/2023] [Indexed: 09/03/2023]
Abstract
OBJECTIVE To evaluate the technical feasibility and potential value of a digital assistant that prompts intensive care unit (ICU) rounding teams to use evidence-based practices based on analysis of their real-time discussions. METHODS We evaluated a novel voice-based digital assistant which audio records and processes the ICU care team's rounding discussions to determine which evidence-based practices are applicable to the patient but have yet to be addressed by the team. The system would then prompt the team to consider indicated but not yet delivered practices, thereby reducing cognitive burden compared to traditional rigid rounding checklists. In a retrospective analysis, we applied automatic transcription, natural language processing, and a rule-based expert system to generate personalized prompts for each patient in 106 audio-recorded ICU rounding discussions. To assess technical feasibility, we compared the system's prompts to those created by experienced critical care nurses who directly observed rounds. To assess potential value, we also compared the system's prompts to a hypothetical paper checklist containing all evidence-based practices. RESULTS The positive predictive value, negative predictive value, true positive rate, and true negative rate of the system's prompts were 0.45 ± 0.06, 0.83 ± 0.04, 0.68 ± 0.07, and 0.66 ± 0.04, respectively. If implemented in lieu of a paper checklist, the system would generate 56% fewer prompts per patient, with 50%±17% greater precision. CONCLUSION A voice-based digital assistant can reduce prompts per patient compared to traditional approaches for improving evidence uptake on ICU rounds. Additional work is needed to evaluate field performance and team acceptance.
Collapse
|
3
|
An interpretable deep learning framework for genome-informed precision oncology. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.11.548534. [PMID: 37503199 PMCID: PMC10369905 DOI: 10.1101/2023.07.11.548534] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
Cancers result from aberrations in cellular signaling systems, typically resulting from driver somatic genome alterations (SGAs) in individual tumors. Precision oncology requires understanding the cellular state and selecting medications that induce vulnerability in cancer cells under such conditions. To this end, we developed a computational framework consisting of two components: 1) A representation-learning component, which learns a representation of the cellular signaling systems when perturbed by SGAs, using a biologically-motivated and interpretable deep learning model. 2) A drug-response-prediction component, which predicts the response to drugs by leveraging the information of the cellular state of the cancer cells derived by the first component. Our cell-state-oriented framework significantly enhances the accuracy of genome-informed prediction of drug responses in comparison to models that directly use SGAs as inputs. Importantly, our framework enables the prediction of response to chemotherapy agents based on SGAs, thus expanding genome-informed precision oncology beyond molecularly targeted drugs.
Collapse
|
4
|
A Bayesian System to Track Outbreaks of Influenza-Like Illnesses Including Novel Diseases. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.05.10.23289799. [PMID: 37293033 PMCID: PMC10246032 DOI: 10.1101/2023.05.10.23289799] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
It would be highly desirable to have a tool that detects the outbreak of a new influenza-like illness, such as COVID-19, accurately and early. This paper describes the ILI Tracker algorithm that first models the daily occurrence of a set of known influenza-like illnesses in a hospital emergency department using findings extracted from patient-care reports using natural language processing. We include results based on modeling the diseases influenza, respiratory syncytial virus, human metapneumovirus, and parainfluenza for five emergency departments in Allegheny County Pennsylvania from June 1, 2010 through May 31, 2015. We then show how the algorithm can be extended to detect the presence of an unmodeled disease which may represent a novel disease outbreak. We also include results for detecting an outbreak of an unmodeled disease during the mentioned time period, which in retrospect was very likely an outbreak of Enterovirus D68.
Collapse
|
5
|
A new method for estimating the probability of causal relationships from observational data: Application to the study of the short-term effects of air pollution on cardiovascular and respiratory disease. Artif Intell Med 2023; 139:102546. [PMID: 37100513 PMCID: PMC10171833 DOI: 10.1016/j.artmed.2023.102546] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2022] [Revised: 04/04/2023] [Accepted: 04/04/2023] [Indexed: 04/28/2023]
Abstract
In this paper we investigate which airborne pollutants have a short-term causal effect on cardiovascular and respiratory disease using the Ancestral Probabilities (AP) procedure, a novel Bayesian approach for deriving the probabilities of causal relationships from observational data. The results are largely consistent with EPA assessments of causality, however, in a few cases AP suggests that some pollutants thought to cause cardiovascular or respiratory disease are associated due purely to confounding. The AP procedure utilizes maximal ancestral graph (MAG) models to represent and assign probabilities to causal relationships while accounting for latent confounding. The algorithm does so locally by marginalizing over models with and without causal features of interest. Before applying AP to real data, we evaluate it in a simulation study and investigate the benefits of providing background knowledge. Overall, the results suggest that AP is an effective tool for causal discovery.
Collapse
|
6
|
Measuring Performance on the ABCDEF Bundle During Interprofessional Rounds via a Nurse-Based Assessment Tool. Am J Crit Care 2023; 32:92-99. [PMID: 36854912 DOI: 10.4037/ajcc2023755] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/02/2023]
Abstract
BACKGROUND Nurse-led rounding checklists are a common strategy for facilitating evidence-based practice in the intensive care unit (ICU). To streamline checklist workflow, some ICUs have the nurse or another individual listen to the conversation and customize the checklist for each patient. Such customizations assume that individuals can reliably assess whether checklist items have been addressed. OBJECTIVE To evaluate whether 1 critical care nurse can reliably assess checklist items on rounds. METHODS Two nurses performed in-person observation of multidisciplinary ICU rounds. Using a standardized paper-based assessment tool, each nurse indicated whether 17 items related to the ABCDEF bundle were discussed during rounds. For each item, generalizability coefficients were used as a measure of reliability, with a single-rater value of 0.70 or greater considered sufficient to support its assessment by 1 nurse. RESULTS The nurse observers assessed 118 patient discussions across 15 observation days. For 11 of 17 items (65%), the generalizability coefficient for a single rater met or exceeded the 0.70 threshold. The generalizability coefficients (95% CIs) of a single rater for key items were as follows: pain, 0.86 (0.74-0.97); delirium score, 0.74 (0.64-0.83); agitation score, 0.72 (0.33-1.00); spontaneous awakening trial, 0.67 (0.49-0.83); spontaneous breathing trial, 0.80 (0.70-0.89); mobility, 0.79 (0.69-0.87); and family (future/past) engagement, 0.82 (0.73-0.90). CONCLUSION Using a paper-based assessment tool, a single trained critical care nurse can reliably assess the discussion of elements of the ABCDEF bundle during multidisciplinary rounds.
Collapse
|
7
|
A Novel Bayesian Framework Infers Driver Activation States and Reveals Pathway-Oriented Molecular Subtypes in Head and Neck Cancer. Cancers (Basel) 2022; 14:cancers14194825. [PMID: 36230748 PMCID: PMC9563147 DOI: 10.3390/cancers14194825] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Revised: 09/28/2022] [Accepted: 09/30/2022] [Indexed: 02/08/2023] Open
Abstract
Head and neck squamous cell cancer (HNSCC) is an aggressive cancer resulting from heterogeneous causes. To reveal the underlying drivers and signaling mechanisms of different HNSCC tumors, we developed a novel Bayesian framework to identify drivers of individual tumors and infer the states of driver proteins in cellular signaling system in HNSCC tumors. First, we systematically identify causal relationships between somatic genome alterations (SGAs) and differentially expressed genes (DEGs) for each TCGA HNSCC tumor using the tumor-specific causal inference (TCI) model. Then, we generalize the most statistically significant driver SGAs and their regulated DEGs in TCGA HNSCC cohort. Finally, we develop machine learning models that combine genomic and transcriptomic data to infer the protein functional activation states of driver SGAs in tumors, which enable us to represent a tumor in the space of cellular signaling systems. We discovered four mechanism-oriented subtypes of HNSCC, which show distinguished patterns of activation state of HNSCC driver proteins, and importantly, this subtyping is orthogonal to previously reported transcriptomic-based molecular subtyping of HNSCC. Further, our analysis revealed driver proteins that are likely involved in oncogenic processes induced by HPV infection, even though they are not perturbed by genomic alterations in HPV+ tumors.
Collapse
|
8
|
A Novel Personalized Random Forest Algorithm for Clinical Outcome Prediction. Stud Health Technol Inform 2022; 290:248-252. [PMID: 35673011 DOI: 10.3233/shti220072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Machine learning algorithms that derive predictive models are useful in predicting patient outcomes under uncertainty. These are often "population" algorithms which optimize a static model to predict well on average for individuals in the population; however, population models may predict poorly for individuals that differ from the average. Personalized machine learning algorithms seek to optimize predictive performance for every patient by tailoring a patient-specific model to each individual. Ensembles of decision trees often outperform single decision tree models, but ensembles of personalized models like decision paths have received little investigation. We present a novel personalized ensemble, called Lazy Random Forest (LazyRF), which consists of bagged randomized decision paths optimized for the individual for whom a prediction will be made. LazyRF outperformed single and bagged decision paths and demonstrated comparable predictive performance to a population random forest method in terms of discrimination on clinical and genomic data while also producing simpler models than the population random forest.
Collapse
|
9
|
Evaluation of eye tracking for a decision support application. JAMIA Open 2021; 4:ooab059. [PMID: 34350394 PMCID: PMC8327376 DOI: 10.1093/jamiaopen/ooab059] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2020] [Revised: 05/08/2021] [Accepted: 07/01/2021] [Indexed: 11/12/2022] Open
Abstract
Eye tracking is used widely to investigate attention and cognitive processes while performing tasks in electronic medical record (EMR) systems. We explored a novel application of eye tracking to collect training data for a machine learning-based clinical decision support tool that predicts which patient data are likely to be relevant for a clinical task. Specifically, we investigated in a laboratory setting the accuracy of eye tracking compared to manual annotation for inferring which patient data in the EMR are judged to be relevant by physicians. We evaluated several methods for processing gaze points that were recorded using a low-cost eye-tracking device. Our results show that eye tracking achieves accuracy and precision of 69% and 53%, respectively compared to manual annotation and are promising for machine learning. The methods for processing gaze points and scripts that we developed offer a first step in developing novel uses for eye tracking for clinical decision support.
Collapse
|
10
|
A simple electronic medical record system designed for research. JAMIA Open 2021; 4:ooab040. [PMID: 34345801 PMCID: PMC8325484 DOI: 10.1093/jamiaopen/ooab040] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2020] [Revised: 03/23/2021] [Accepted: 05/05/2021] [Indexed: 11/14/2022] Open
Abstract
With the extensive deployment of electronic medical record (EMR) systems, EMR usability remains a significant source of frustration to clinicians. There is a significant research need for software that emulates EMR systems and enables investigators to conduct laboratory-based human–computer interaction studies. We developed an open-source software package that implements the display functions of an EMR system. The user interface emphasizes the temporal display of vital signs, medication administrations, and laboratory test results. It is well suited to support research about clinician information-seeking behaviors and adaptive user interfaces in terms of measures that include task accuracy, time to completion, and cognitive load. The Simple EMR System is freely available to the research community and is on GitHub.
Collapse
|
11
|
Bayesian network models with decision tree analysis for management of childhood malaria in Malawi. BMC Med Inform Decis Mak 2021; 21:158. [PMID: 34001100 PMCID: PMC8130361 DOI: 10.1186/s12911-021-01514-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Accepted: 05/04/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Malaria is a major cause of death in children under five years old in low- and middle-income countries such as Malawi. Accurate diagnosis and management of malaria can help reduce the global burden of childhood morbidity and mortality. Trained healthcare workers in rural health centers manage malaria with limited supplies of malarial diagnostic tests and drugs for treatment. A clinical decision support system that integrates predictive models to provide an accurate prediction of malaria based on clinical features could aid healthcare workers in the judicious use of testing and treatment. We developed Bayesian network (BN) models to predict the probability of malaria from clinical features and an illustrative decision tree to model the decision to use or not use a malaria rapid diagnostic test (mRDT). METHODS We developed two BN models to predict malaria from a dataset of outpatient encounters of children in Malawi. The first BN model was created manually with expert knowledge, and the second model was derived using an automated method. The performance of the BN models was compared to other statistical models on a range of performance metrics at multiple thresholds. We developed a decision tree that integrates predictions with the costs of mRDT and a course of recommended treatment. RESULTS The manually created BN model achieved an area under the ROC curve (AUC) equal to 0.60 which was statistically significantly higher than the other models. At the optimal threshold for classification, the manual BN model had sensitivity and specificity of 0.74 and 0.42 respectively, and the automated BN model had sensitivity and specificity of 0.45 and 0.68 respectively. The balanced accuracy values were similar across all the models. Sensitivity analysis of the decision tree showed that for values of probability of malaria below 0.04 and above 0.40, the preferred decision that minimizes expected costs is not to perform mRDT. CONCLUSION In resource-constrained settings, judicious use of mRDT is important. Predictive models in combination with decision analysis can provide personalized guidance on when to use mRDT in the management of childhood malaria. BN models can be efficiently derived from data to support clinical decision making.
Collapse
|
12
|
Patient-Specific Modeling with Personalized Decision Paths. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2021; 2020:602-611. [PMID: 33936434 PMCID: PMC8075540] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Predictive models can be useful in predicting patient outcomes under uncertainty. Many algorithms employ "population" methods, which optimize a single model to perform well on average over an entire population, but the model may perform poorly on some patients. Personalized methods optimize predictive performance for each patient by tailoring the model to the individual. We present a new personalized method based on decision trees: the Personalized Decision Path using a Bayesian score (PDP-Bay). Performance on eight synthetic, genomic, and clinical datasets was compared to that of decision trees and a previously described personalized decision path method in terms of area under the ROC curve (AUC) and expected calibration error (ECE). Model complexity was measured by average path length. The PDP-Bay model outperformed the decision tree in terms of both AUC and ECE. The results support the conclusion that personalization may achieve better predictive performance and produce simpler models than population approaches.
Collapse
|
13
|
Modeling physician variability to prioritize relevant medical record information. JAMIA Open 2020; 3:602-610. [PMID: 33623894 PMCID: PMC7886572 DOI: 10.1093/jamiaopen/ooaa058] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Revised: 10/05/2020] [Accepted: 11/02/2020] [Indexed: 02/05/2023] Open
Abstract
Objective Patient information can be retrieved more efficiently in electronic medical record (EMR) systems by using machine learning models that predict which information a physician will seek in a clinical context. However, information-seeking behavior varies across EMR users. To explicitly account for this variability, we derived hierarchical models and compared their performance to nonhierarchical models in identifying relevant patient information in intensive care unit (ICU) cases. Materials and methods Critical care physicians reviewed ICU patient cases and selected data items relevant for presenting at morning rounds. Using patient EMR data as predictors, we derived hierarchical logistic regression (HLR) and standard logistic regression (LR) models to predict their relevance. Results In 73 pairs of HLR and LR models, the HLR models achieved an area under the receiver operating characteristic curve of 0.81, 95% confidence interval (CI) [0.80-0.82], which was statistically significantly higher than that of LR models (0.75, 95% CI [0.74-0.76]). Further, the HLR models achieved statistically significantly lower expected calibration error (0.07, 95% CI [0.06-0.08]) than LR models (0.16, 95% CI [0.14-0.17]). Discussion The physician reviewers demonstrated variability in selecting relevant data. Our results show that HLR models perform significantly better than LR models with respect to both discrimination and calibration. This is likely due to explicitly modeling physician-related variability. Conclusion Hierarchical models can yield better performance when there is physician-related variability as in the case of identifying relevant information in the EMR.
Collapse
|
14
|
Explicit representation of protein activity states significantly improves causal discovery of protein phosphorylation networks. BMC Bioinformatics 2020; 21:379. [PMID: 32938361 DOI: 10.1186/s12859-020-03676-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Protein phosphorylation networks play an important role in cell signaling. In these networks, phosphorylation of a protein kinase usually leads to its activation, which in turn will phosphorylate its downstream target proteins. A phosphorylation network is essentially a causal network, which can be learned by causal inference algorithms. Prior efforts have applied such algorithms to data measuring protein phosphorylation levels, assuming that the phosphorylation levels represent protein activity states. However, the phosphorylation status of a kinase does not always reflect its activity state, because interventions such as inhibitors or mutations can directly affect its activity state without changing its phosphorylation status. Thus, when cellular systems are subjected to extensive perturbations, the statistical relationships between phosphorylation states of proteins may be disrupted, making it difficult to reconstruct the true protein phosphorylation network. Here, we describe a novel framework to address this challenge. RESULTS We have developed a causal discovery framework that explicitly represents the activity state of each protein kinase as an unmeasured variable and developed a novel algorithm called "InferA" to infer the protein activity states, which allows us to incorporate the protein phosphorylation level, pharmacological interventions and prior knowledge. We applied our framework to simulated datasets and to a real-world dataset. The simulation experiments demonstrated that explicit representation of activity states of protein kinases allows one to effectively represent the impact of interventions and thus enabled our framework to accurately recover the ground-truth causal network. Results from the real-world dataset showed that the explicit representation of protein activity states allowed an effective and data-driven integration of the prior knowledge by InferA, which further leads to the recovery of a phosphorylation network that is more consistent with experiment results. CONCLUSIONS Explicit representation of the protein activity states by our novel framework significantly enhances causal discovery of protein phosphorylation networks.
Collapse
|
15
|
Abstract
BACKGROUND Complex electronic medical records (EMRs) presenting large amounts of data create risks of cognitive overload. We are designing a Learning EMR (LEMR) system that utilizes models of intensive care unit (ICU) physicians' data access patterns to identify and then highlight the most relevant data for each patient. OBJECTIVES We used insights from literature and feedback from potential users to inform the design of an EMR display capable of highlighting relevant information. METHODS We used a review of relevant literature to guide the design of preliminary paper prototypes of the LEMR user interface. We observed five ICU physicians using their current EMR systems in preparation for morning rounds. Participants were interviewed and asked to explain their interactions and challenges with the EMR systems. Findings informed the revision of our prototypes. Finally, we conducted a focus group with five ICU physicians to elicit feedback on our designs and to generate ideas for our final prototypes using participatory design methods. RESULTS Participating physicians expressed support for the LEMR system. Identified design requirements included the display of data essential for every patient together with diagnosis-specific data and new or significantly changed information. Respondents expressed preferences for fishbones to organize labs, mouseovers to access additional details, and unobtrusive alerts minimizing color-coding. To address the concern about possible physician overreliance on highlighting, participants suggested that non-highlighted data should remain accessible. Study findings led to revised prototypes, which will inform the development of a functional user interface. CONCLUSION In the feedback we received, physicians supported pursuing the concept of a LEMR system. By introducing novel ways to support physicians' cognitive abilities, such a system has the potential to enhance physician EMR use and lead to better patient outcomes. Future plans include laboratory studies of both the utility of the proposed designs on decision-making, and the possible impact of any automation bias.
Collapse
|
16
|
Leveraging Eye Tracking to Prioritize Relevant Medical Record Data: Comparative Machine Learning Study. J Med Internet Res 2020; 22:e15876. [PMID: 32238342 PMCID: PMC7163414 DOI: 10.2196/15876] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2019] [Revised: 12/04/2019] [Accepted: 01/23/2020] [Indexed: 02/05/2023] Open
Abstract
BACKGROUND Electronic medical record (EMR) systems capture large amounts of data per patient and present that data to physicians with little prioritization. Without prioritization, physicians must mentally identify and collate relevant data, an activity that can lead to cognitive overload. To mitigate cognitive overload, a Learning EMR (LEMR) system prioritizes the display of relevant medical record data. Relevant data are those that are pertinent to a context-defined as the combination of the user, clinical task, and patient case. To determine which data are relevant in a specific context, a LEMR system uses supervised machine learning models of physician information-seeking behavior. Since obtaining information-seeking behavior data via manual annotation is slow and expensive, automatic methods for capturing such data are needed. OBJECTIVE The goal of the research was to propose and evaluate eye tracking as a high-throughput method to automatically acquire physician information-seeking behavior useful for training models for a LEMR system. METHODS Critical care medicine physicians reviewed intensive care unit patient cases in an EMR interface developed for the study. Participants manually identified patient data that were relevant in the context of a clinical task: preparing a patient summary to present at morning rounds. We used eye tracking to capture each physician's gaze dwell time on each data item (eg, blood glucose measurements). Manual annotations and gaze dwell times were used to define target variables for developing supervised machine learning models of physician information-seeking behavior. We compared the performance of manual selection and gaze-derived models on an independent set of patient cases. RESULTS A total of 68 pairs of manual selection and gaze-derived machine learning models were developed from training data and evaluated on an independent evaluation data set. A paired Wilcoxon signed-rank test showed similar performance of manual selection and gaze-derived models on area under the receiver operating characteristic curve (P=.40). CONCLUSIONS We used eye tracking to automatically capture physician information-seeking behavior and used it to train models for a LEMR system. The models that were trained using eye tracking performed like models that were trained using manual annotations. These results support further development of eye tracking as a high-throughput method for training clinical decision support systems that prioritize the display of relevant medical record data.
Collapse
|
17
|
A Bayesian approach for detecting a disease that is not being modeled. PLoS One 2020; 15:e0229658. [PMID: 32109254 PMCID: PMC7048291 DOI: 10.1371/journal.pone.0229658] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2019] [Accepted: 02/12/2020] [Indexed: 11/19/2022] Open
Abstract
Over the past decade, outbreaks of new or reemergent viruses such as severe acute respiratory syndrome (SARS) virus, Middle East respiratory syndrome (MERS) virus, and Zika have claimed thousands of lives and cost governments and healthcare systems billions of dollars. Because the appearance of new or transformed diseases is likely to continue, the detection and characterization of emergent diseases is an important problem. We describe a Bayesian statistical model that can detect and characterize previously unknown and unmodeled diseases from patient-care reports and evaluate its performance on historical data.
Collapse
|
18
|
Lung Cancer Survival Prediction Using Instance-Specific Bayesian Networks. Artif Intell Med 2020. [DOI: 10.1007/978-3-030-59137-3_14] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
|
19
|
Automated influenza case detection for public health surveillance and clinical diagnosis using dynamic influenza prevalence method. J Public Health (Oxf) 2019; 40:878-885. [PMID: 29059331 DOI: 10.1093/pubmed/fdx141] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2017] [Indexed: 11/13/2022] Open
Abstract
Objectives To assess the performance of a Bayesian case detector (BCD) for influenza surveillance and clinical diagnosis. Methods BCD uses a Bayesian network classifier to compute the posterior probability of a patient having influenza based on 31 findings from narrative clinical notes. To assess the potential for disease surveillance, we calculated area under the receiver operating characteristic curve (AUC) to indicate BCD's ability to differentiate between influenza and non-influenza encounters in emergency department settings. To assess the potential for clinical diagnosis, we measured AUC for diagnosing influenza cases among encounters having influenza-like illnesses. We also evaluated the performance of BCD using dynamically estimated influenza prevalence, and measured sensitivity, specificity and positive predictive value. Results For influenza surveillance, BCD differentiated between influenza and non-influenza encounters well with an AUC of 0.90 and 0.97 with dynamic influenza prevalence (P < 0.0001). For clinical diagnosis, the addition of dynamic influenza prevalence to BCD significantly improved AUC from 0.63 to 0.85 to distinguish influenza from other causes of influenza-like illness. Conclusions and policy implications BCD can serve as an influenza surveillance and a differential diagnosis tool via our dynamic prevalence approach. It enhances the communication between public health and clinical practice.
Collapse
|
20
|
Using machine learning to selectively highlight patient information. J Biomed Inform 2019; 100:103327. [PMID: 31676461 DOI: 10.1016/j.jbi.2019.103327] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2019] [Revised: 08/20/2019] [Accepted: 10/28/2019] [Indexed: 02/05/2023]
Abstract
BACKGROUND Electronic medical record (EMR) systems need functionality that decreases cognitive overload by drawing the clinician's attention to the right data, at the right time. We developed a Learning EMR (LEMR) system that learns statistical models of clinician information-seeking behavior and applies those models to direct the display of data in future patients. We evaluated the performance of the system in identifying relevant patient data in intensive care unit (ICU) patient cases. METHODS To capture information-seeking behavior, we enlisted critical care medicine physicians who reviewed a set of patient cases and selected data items relevant to the task of presenting at morning rounds. Using patient EMR data as predictors, we built machine learning models to predict their relevancy. We prospectively evaluated the predictions of a set of high performing models. RESULTS On an independent evaluation data set, 25 models achieved precision of 0.52, 95% CI [0.49, 0.54] and recall of 0.77, 95% CI [0.75, 0.80] in identifying relevant patient data items. For data items missed by the system, the reviewers rated the effect of not seeing those data from no impact to minor impact on patient care in about 82% of the cases. CONCLUSION Data-driven approaches for adaptively displaying data in EMR systems, like the LEMR system, show promise in using information-seeking behavior of clinicians to identify and highlight relevant patient data.
Collapse
|
21
|
Learning High-dimensional Directed Acyclic Graphs with Mixed Data-types. PROCEEDINGS OF MACHINE LEARNING RESEARCH 2019; 104:4-21. [PMID: 31453569 PMCID: PMC6709674] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
In recent years, great strides have been made for causal structure learning in the high-dimensional setting and in the mixed data-type setting when there are both discrete and continuous variables. However, due to the complications involved with modeling continuous-discrete variable interactions, the intersection of these two settings has been relatively understudied. The current paper explores the problem of efficiently extending causal structure learning algorithms to high-dimensional data with mixed data-types. First, we characterize a model over continuous and discrete variables. Second, we derive a degenerate Gaussian (DG) score for mixed data-types and discuss its asymptotic properties. Lastly, we demonstrate the practicality of the DG score on learning causal structures from simulated data sets.
Collapse
|
22
|
Systematic discovery of the functional impact of somatic genome alterations in individual tumors through tumor-specific causal inference. PLoS Comput Biol 2019; 15:e1007088. [PMID: 31276486 PMCID: PMC6650088 DOI: 10.1371/journal.pcbi.1007088] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2019] [Revised: 07/23/2019] [Accepted: 05/09/2019] [Indexed: 02/07/2023] Open
Abstract
Cancer is mainly caused by somatic genome alterations (SGAs). Precision oncology involves identifying and targeting tumor-specific aberrations resulting from causative SGAs. We developed a novel tumor-specific computational framework that finds the likely causative SGAs in an individual tumor and estimates their impact on oncogenic processes, which suggests the disease mechanisms that are acting in that tumor. This information can be used to guide precision oncology. We report a tumor-specific causal inference (TCI) framework, which estimates causative SGAs by modeling causal relationships between SGAs and molecular phenotypes (e.g., transcriptomic, proteomic, or metabolomic changes) within an individual tumor. We applied the TCI algorithm to tumors from The Cancer Genome Atlas (TCGA) and estimated for each tumor the SGAs that causally regulate the differentially expressed genes (DEGs) in that tumor. Overall, TCI identified 634 SGAs that are predicted to cause cancer-related DEGs in a significant number of tumors, including most of the previously known drivers and many novel candidate cancer drivers. The inferred causal relationships are statistically robust and biologically sensible, and multiple lines of experimental evidence support the predicted functional impact of both the well-known and the novel candidate drivers that are predicted by TCI. TCI provides a unified framework that integrates multiple types of SGAs and molecular phenotypes to estimate which genome perturbations are causally influencing one or more molecular/cellular phenotypes in an individual tumor. By identifying major candidate drivers and revealing their functional impact in an individual tumor, TCI sheds light on the disease mechanisms of that tumor, which can serve to advance our basic knowledge of cancer biology and to support precision oncology that provides tailored treatment of individual tumors.
Collapse
|
23
|
Using Machine Learning to Predict the Information Seeking Behavior of Clinicians Using an Electronic Medical Record System. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2018; 2018:673-682. [PMID: 30815109 PMCID: PMC6371238] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Poor electronic medical record (EMR) usability is detrimental to both clinicians and patients. A better EMR would provide concise, context sensitive patient data, but doing so entails the difficult task of knowing which data are relevant. To determine the relevance of patient data in different contexts, we collect and model the information seeking behavior of clinicians using a learning EMR (LEMR) system. Sufficient data were collected to train predictive models for 80 different targets (e.g., glucose level, heparin administration) and 27 of them had AUROC values of greater than 0.7. These results are encouraging considering the high variation in information seeking behavior (intraclass correlation 0.40). We plan to apply these models to a new set of patient cases and adapt the LEMR interface to highlight relevant patient data, and thus provide concise, context sensitive data.
Collapse
|
24
|
Instance-Specific Bayesian Network Structure Learning. PROCEEDINGS OF MACHINE LEARNING RESEARCH 2018; 72:169-180. [PMID: 30775723 PMCID: PMC6376975] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Bayesian network (BN) structure learning algorithms are almost always designed to recover the structure that models the relationships that are shared by the instances in a population. While accurately learning such population-wide Bayesian networks is useful, learning Bayesian networks that are specific to each instance is often important as well. For example, to understand and treat a patient (instance), it is critical to understand the specific causal mechanisms that are operating in that particular patient. We introduce an instance-specific BN structure learning method that searches the space of Bayesian networks to build a model that is specific to an instance by guiding the search based on attributes of the given instance (e.g., patient symptoms, signs, lab results, and genotype). The structure discovery performance of the proposed method is compared to an existing state-of-the-art BN structure learning method, namely an implementation of the Greedy Equivalence Search algorithm called FGES, using both simulated and real data. The results show that the proposed method improves the precision of the model structure that is output, when compared to GES, especially for those variables that exhibit context-specific independence.
Collapse
|
25
|
Precision Oncology beyond Targeted Therapy: Combining Omics Data with Machine Learning Matches the Majority of Cancer Cells to Effective Therapeutics. Mol Cancer Res 2017; 16:269-278. [PMID: 29133589 DOI: 10.1158/1541-7786.mcr-17-0378] [Citation(s) in RCA: 89] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2017] [Revised: 10/02/2017] [Accepted: 11/02/2017] [Indexed: 02/06/2023]
Abstract
Precision oncology involves identifying drugs that will effectively treat a tumor and then prescribing an optimal clinical treatment regimen. However, most first-line chemotherapy drugs do not have biomarkers to guide their application. For molecularly targeted drugs, using the genomic status of a drug target as a therapeutic indicator has limitations. In this study, machine learning methods (e.g., deep learning) were used to identify informative features from genome-scale omics data and to train classifiers for predicting the effectiveness of drugs in cancer cell lines. The methodology introduced here can accurately predict the efficacy of drugs, regardless of whether they are molecularly targeted or nonspecific chemotherapy drugs. This approach, on a per-drug basis, can identify sensitive cancer cells with an average sensitivity of 0.82 and specificity of 0.82; on a per-cell line basis, it can identify effective drugs with an average sensitivity of 0.80 and specificity of 0.82. This report describes a data-driven precision medicine approach that is not only generalizable but also optimizes therapeutic efficacy. The framework detailed herein, when successfully translated to clinical environments, could significantly broaden the scope of precision oncology beyond targeted therapies, benefiting an expanded proportion of cancer patients. Mol Cancer Res; 16(2); 269-78. ©2017 AACR.
Collapse
|
26
|
A Bayesian system to detect and characterize overlapping outbreaks. J Biomed Inform 2017; 73:171-181. [PMID: 28797710 DOI: 10.1016/j.jbi.2017.08.003] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2016] [Revised: 07/04/2017] [Accepted: 08/04/2017] [Indexed: 10/19/2022]
Abstract
Outbreaks of infectious diseases such as influenza are a significant threat to human health. Because there are different strains of influenza which can cause independent outbreaks, and influenza can affect demographic groups at different rates and times, there is a need to recognize and characterize multiple outbreaks of influenza. This paper describes a Bayesian system that uses data from emergency department patient care reports to create epidemiological models of overlapping outbreaks of influenza. Clinical findings are extracted from patient care reports using natural language processing. These findings are analyzed by a case detection system to create disease likelihoods that are passed to a multiple outbreak detection system. We evaluated the system using real and simulated outbreaks. The results show that this approach can recognize and characterize overlapping outbreaks of influenza. We describe several extensions that appear promising.
Collapse
|
27
|
Eye-tracking for clinical decision support: A method to capture automatically what physicians are viewing in the EMR. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2017; 2017:512-521. [PMID: 28815151 PMCID: PMC5543363] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Eye-tracking is a valuable research tool that is used in laboratory and limited field environments. We take steps toward developing methods that enable widespread adoption of eye-tracking and its real-time application in clinical decision support. Eye-tracking will enhance awareness and enable intelligent views, more precise alerts, and other forms of decision support in the Electronic Medical Record (EMR). We evaluated a low-cost eye-tracking device and found the device's accuracy to be non-inferior to a more expensive device. We also developed and evaluated an automatic method for mapping eye-tracking data to interface elements in the EMR (e.g., a displayed laboratory test value). Mapping was 88% accurate across the six participants in our experiment. Finally, we piloted the use of the low-cost device and the automatic mapping method to label training data for a Learning EMR (LEMR) which is a system that highlights the EMR elements a physician is predicted to use.
Collapse
|
28
|
The effects of natural language processing on cross-institutional portability of influenza case detection for disease surveillance. Appl Clin Inform 2017; 8:560-580. [PMID: 28561130 PMCID: PMC6241736 DOI: 10.4338/aci-2016-12-ra-0211] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2016] [Accepted: 03/11/2017] [Indexed: 11/23/2022] Open
Abstract
OBJECTIVES This study evaluates the accuracy and portability of a natural language processing (NLP) tool for extracting clinical findings of influenza from clinical notes across two large healthcare systems. Effectiveness is evaluated on how well NLP supports downstream influenza case-detection for disease surveillance. METHODS We independently developed two NLP parsers, one at Intermountain Healthcare (IH) in Utah and the other at University of Pittsburgh Medical Center (UPMC) using local clinical notes from emergency department (ED) encounters of influenza. We measured NLP parser performance for the presence and absence of 70 clinical findings indicative of influenza. We then developed Bayesian network models from NLP processed reports and tested their ability to discriminate among cases of (1) influenza, (2) non-influenza influenza-like illness (NI-ILI), and (3) 'other' diagnosis. RESULTS On Intermountain Healthcare reports, recall and precision of the IH NLP parser were 0.71 and 0.75, respectively, and UPMC NLP parser, 0.67 and 0.79. On University of Pittsburgh Medical Center reports, recall and precision of the UPMC NLP parser were 0.73 and 0.80, respectively, and IH NLP parser, 0.53 and 0.80. Bayesian case-detection performance measured by AUROC for influenza versus non-influenza on Intermountain Healthcare cases was 0.93 (using IH NLP parser) and 0.93 (using UPMC NLP parser). Case-detection on University of Pittsburgh Medical Center cases was 0.95 (using UPMC NLP parser) and 0.83 (using IH NLP parser). For influenza versus NI-ILI on Intermountain Healthcare cases performance was 0.70 (using IH NLP parser) and 0.76 (using UPMC NLP parser). On University of Pisstburgh Medical Center cases, 0.76 (using UPMC NLP parser) and 0.65 (using IH NLP parser). CONCLUSION In all but one instance (influenza versus NI-ILI using IH cases), local parsers were more effective at supporting case-detection although performances of non-local parsers were reasonable.
Collapse
|
29
|
A study of the transferability of influenza case detection systems between two large healthcare systems. PLoS One 2017; 12:e0174970. [PMID: 28380048 PMCID: PMC5381795 DOI: 10.1371/journal.pone.0174970] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2016] [Accepted: 03/17/2017] [Indexed: 01/16/2023] Open
Abstract
Objectives This study evaluates the accuracy and transferability of Bayesian case detection systems (BCD) that use clinical notes from emergency department (ED) to detect influenza cases. Methods A BCD uses natural language processing (NLP) to infer the presence or absence of clinical findings from ED notes, which are fed into a Bayesain network classifier (BN) to infer patients’ diagnoses. We developed BCDs at the University of Pittsburgh Medical Center (BCDUPMC) and Intermountain Healthcare in Utah (BCDIH). At each site, we manually built a rule-based NLP and trained a Bayesain network classifier from over 40,000 ED encounters between Jan. 2008 and May. 2010 using feature selection, machine learning, and expert debiasing approach. Transferability of a BCD in this study may be impacted by seven factors: development (source) institution, development parser, application (target) institution, application parser, NLP transfer, BN transfer, and classification task. We employed an ANOVA analysis to study their impacts on BCD performance. Results Both BCDs discriminated well between influenza and non-influenza on local test cases (AUCs > 0.92). When tested for transferability using the other institution’s cases, BCDUPMC discriminations declined minimally (AUC decreased from 0.95 to 0.94, p<0.01), and BCDIH discriminations declined more (from 0.93 to 0.87, p<0.0001). We attributed the BCDIH decline to the lower recall of the IH parser on UPMC notes. The ANOVA analysis showed five significant factors: development parser, application institution, application parser, BN transfer, and classification task. Conclusion We demonstrated high influenza case detection performance in two large healthcare systems in two geographically separated regions, providing evidentiary support for the use of automated case detection from routinely collected electronic clinical notes in national influenza surveillance. The transferability could be improved by training Bayesian network classifier locally and increasing the accuracy of the NLP parser.
Collapse
|
30
|
Binary Classifier Calibration Using an Ensemble of Linear Trend Estimation. PROCEEDINGS OF THE ... SIAM INTERNATIONAL CONFERENCE ON DATA MINING. SIAM INTERNATIONAL CONFERENCE ON DATA MINING 2017; 2016:261-269. [PMID: 28357158 DOI: 10.1137/1.9781611974348.30] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
Learning accurate probabilistic models from data is crucial in many practical tasks in data mining. In this paper we present a new non-parametric calibration method called ensemble of linear trend estimation (ELiTE). ELiTE utilizes the recently proposed ℓ1 trend ltering signal approximation method [22] to find the mapping from uncalibrated classification scores to the calibrated probability estimates. ELiTE is designed to address the key limitations of the histogram binning-based calibration methods which are (1) the use of a piecewise constant form of the calibration mapping using bins, and (2) the assumption of independence of predicted probabilities for the instances that are located in different bins. The method post-processes the output of a binary classifier to obtain calibrated probabilities. Thus, it can be applied with many existing classification models. We demonstrate the performance of ELiTE on real datasets for commonly used binary classification models. Experimental results show that the method outperforms several common binary-classifier calibration methods. In particular, ELiTE commonly performs statistically significantly better than the other methods, and never worse. Moreover, it is able to improve the calibration power of classifiers, while retaining their discrimination power. The method is also computationally tractable for large scale datasets, as it is practically O(N log N) time, where N is the number of samples.
Collapse
|
31
|
Binary Classifier Calibration using an Ensemble of Near Isotonic Regression Models. PROCEEDINGS. IEEE INTERNATIONAL CONFERENCE ON DATA MINING 2017; 2016:360-369. [PMID: 28316511 PMCID: PMC5351887 DOI: 10.1109/icdm.2016.0047] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Learning accurate probabilistic models from data is crucial in many practical tasks in data mining. In this paper we present a new non-parametric calibration method called ensemble of near isotonic regression (ENIR). The method can be considered as an extension of BBQ [20], a recently proposed calibration method, as well as the commonly used calibration method based on isotonic regression (IsoRegC) [27]. ENIR is designed to address the key limitation of IsoRegC which is the monotonicity assumption of the predictions. Similar to BBQ, the method post-processes the output of a binary classifier to obtain calibrated probabilities. Thus it can be used with many existing classification models to generate accurate probabilistic predictions. We demonstrate the performance of ENIR on synthetic and real datasets for commonly applied binary classification models. Experimental results show that the method outperforms several common binary classifier calibration methods. In particular on the real data, ENIR commonly performs statistically significantly better than the other methods, and never worse. It is able to improve the calibration power of classifiers, while retaining their discrimination power. The method is also computationally tractable for large scale datasets, as it is O(N log N) time, where N is the number of samples.
Collapse
|
32
|
Signal-Oriented Pathway Analyses Reveal a Signaling Complex as a Synthetic Lethal Target for p53 Mutations. Cancer Res 2016; 76:6785-6794. [PMID: 27758891 DOI: 10.1158/0008-5472.can-16-1740] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2016] [Revised: 08/31/2016] [Accepted: 09/18/2016] [Indexed: 11/16/2022]
Abstract
Defining processes that are synthetic lethal with p53 mutations in cancer cells may reveal possible therapeutic strategies. In this study, we report the development of a signal-oriented computational framework for cancer pathway discovery in this context. We applied our bipartite graph-based functional module discovery algorithm to identify transcriptomic modules abnormally expressed in multiple tumors, such that the genes in a module were likely regulated by a common, perturbed signal. For each transcriptomic module, we applied our weighted k-path merge algorithm to search for a set of somatic genome alterations (SGA) that likely perturbed the signal, that is, the candidate members of the pathway that regulate the transcriptomic module. Computational evaluations indicated that our methods-identified pathways were perturbed by SGA. In particular, our analyses revealed that SGA affecting TP53, PTK2, YWHAZ, and MED1 perturbed a set of signals that promote cell proliferation, anchor-free colony formation, and epithelial-mesenchymal transition (EMT). These proteins formed a signaling complex that mediates these oncogenic processes in a coordinated fashion. Disruption of this signaling complex by knocking down PTK2, YWHAZ, or MED1 attenuated and reversed oncogenic phenotypes caused by mutant p53 in a synthetic lethal manner. This signal-oriented framework for searching pathways and therapeutic targets is applicable to all cancer types, thus potentially impacting precision medicine in cancer. Cancer Res; 76(23); 6785-94. ©2016 AACR.
Collapse
|
33
|
Comparison of machine learning classifiers for influenza detection from emergency department free-text reports. J Biomed Inform 2015; 58:60-69. [PMID: 26385375 PMCID: PMC4684714 DOI: 10.1016/j.jbi.2015.08.019] [Citation(s) in RCA: 56] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2015] [Revised: 05/28/2015] [Accepted: 08/21/2015] [Indexed: 12/31/2022]
Abstract
Influenza is a yearly recurrent disease that has the potential to become a pandemic. An effective biosurveillance system is required for early detection of the disease. In our previous studies, we have shown that electronic Emergency Department (ED) free-text reports can be of value to improve influenza detection in real time. This paper studies seven machine learning (ML) classifiers for influenza detection, compares their diagnostic capabilities against an expert-built influenza Bayesian classifier, and evaluates different ways of handling missing clinical information from the free-text reports. We identified 31,268 ED reports from 4 hospitals between 2008 and 2011 to form two different datasets: training (468 cases, 29,004 controls), and test (176 cases and 1620 controls). We employed Topaz, a natural language processing (NLP) tool, to extract influenza-related findings and to encode them into one of three values: Acute, Non-acute, and Missing. Results show that all ML classifiers had areas under ROCs (AUC) ranging from 0.88 to 0.93, and performed significantly better than the expert-built Bayesian model. Missing clinical information marked as a value of missing (not missing at random) had a consistently improved performance among 3 (out of 4) ML classifiers when it was compared with the configuration of not assigning a value of missing (missing completely at random). The case/control ratios did not affect the classification performance given the large number of training cases. Our study demonstrates ED reports in conjunction with the use of ML and NLP with the handling of missing value information have a great potential for the detection of infectious diseases.
Collapse
|
34
|
Development and Preliminary Evaluation of a Prototype of a Learning Electronic Medical Record System. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2015; 2015:1967-1975. [PMID: 26958296 PMCID: PMC4765593] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Electronic medical records (EMRs) are capturing increasing amounts of data per patient. For clinicians to efficiently and accurately understand a patient's clinical state, better ways are needed to determine when and how to display EMR data. We built a prototype system that records how physicians view EMR data, which we used to train models that predict which EMR data will be relevant in a given patient. We call this approach a Learning EMR (LEMR). A physician used the prototype to review 59 intensive care unit (ICU) patient cases. We used the data-access patterns from these cases to train logistic regression models that, when evaluated, had AUROC values as high as 0.92 and that averaged 0.73, supporting that the approach is promising. A preliminary usability study identified advantages of the system and a few concerns about implementation. Overall, 3 of 4 ICU physicians were enthusiastic about features of the prototype.
Collapse
|
35
|
The center for causal discovery of biomedical knowledge from big data. J Am Med Inform Assoc 2015; 22:1132-6. [PMID: 26138794 PMCID: PMC5009908 DOI: 10.1093/jamia/ocv059] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2015] [Revised: 04/27/2015] [Accepted: 05/02/2015] [Indexed: 01/12/2023] Open
Abstract
The Big Data to Knowledge (BD2K) Center for Causal Discovery is developing and disseminating an integrated set of open source tools that support causal modeling and discovery of biomedical knowledge from large and complex biomedical datasets. The Center integrates teams of biomedical and data scientists focused on the refinement of existing and the development of new constraint-based and Bayesian algorithms based on causal Bayesian networks, the optimization of software for efficient operation in a supercomputing environment, and the testing of algorithms and software developed using real data from 3 representative driving biomedical projects: cancer driver mutations, lung disease, and the functional connectome of the human brain. Associated training activities provide both biomedical and data scientists with the knowledge and skills needed to apply and extend these tools. Collaborative activities with the BD2K Consortium further advance causal discovery tools and integrate tools and resources developed by other centers.
Collapse
|
36
|
Binary Classifier Calibration Using a Bayesian Non-Parametric Approach. PROCEEDINGS OF THE ... SIAM INTERNATIONAL CONFERENCE ON DATA MINING. SIAM INTERNATIONAL CONFERENCE ON DATA MINING 2015; 2015:208-216. [PMID: 26613068 DOI: 10.1137/1.9781611974010.24] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
Learning probabilistic predictive models that are well calibrated is critical for many prediction and decision-making tasks in Data mining. This paper presents two new non-parametric methods for calibrating outputs of binary classification models: a method based on the Bayes optimal selection and a method based on the Bayesian model averaging. The advantage of these methods is that they are independent of the algorithm used to learn a predictive model, and they can be applied in a post-processing step, after the model is learned. This makes them applicable to a wide variety of machine learning models and methods. These calibration methods, as well as other methods, are tested on a variety of datasets in terms of both discrimination and calibration performance. The results show the methods either outperform or are comparable in performance to the state-of-the-art calibration methods.
Collapse
|
37
|
Personalized Modeling for Prediction with Decision-Path Models. PLoS One 2015; 10:e0131022. [PMID: 26098570 PMCID: PMC4476684 DOI: 10.1371/journal.pone.0131022] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2015] [Accepted: 05/26/2015] [Indexed: 11/25/2022] Open
Abstract
Deriving predictive models in medicine typically relies on a population approach where a single model is developed from a dataset of individuals. In this paper we describe and evaluate a personalized approach in which we construct a new type of decision tree model called decision-path model that takes advantage of the particular features of a given person of interest. We introduce three personalized methods that derive personalized decision-path models. We compared the performance of these methods to that of Classification And Regression Tree (CART) that is a population decision tree to predict seven different outcomes in five medical datasets. Two of the three personalized methods performed statistically significantly better on area under the ROC curve (AUC) and Brier skill score compared to CART. The personalized approach of learning decision path models is a new approach for predictive modeling that can perform better than a population approach.
Collapse
|
38
|
Obtaining Well Calibrated Probabilities Using Bayesian Binning. PROCEEDINGS OF THE ... AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE. AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE 2015; 2015:2901-2907. [PMID: 25927013 PMCID: PMC4410090] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Learning probabilistic predictive models that are well calibrated is critical for many prediction and decision-making tasks in artificial intelligence. In this paper we present a new non-parametric calibration method called Bayesian Binning into Quantiles (BBQ) which addresses key limitations of existing calibration methods. The method post processes the output of a binary classification algorithm; thus, it can be readily combined with many existing classification algorithms. The method is computationally tractable, and empirically accurate, as evidenced by the set of experiments reported here on both real and simulated datasets.
Collapse
|
39
|
Application of Bayesian logistic regression to mining biomedical data. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2014; 2014:266-273. [PMID: 25954328 PMCID: PMC4419893] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Mining high dimensional biomedical data with existing classifiers is challenging and the predictions are often inaccurate. We investigated the use of Bayesian Logistic Regression (B-LR) for mining such data to predict and classify various disease conditions. The analysis was done on twelve biomedical datasets with binary class variables and the performance of B-LR was compared to those from other popular classifiers on these datasets with 10-fold cross validation using the WEKA data mining toolkit. The statistical significance of the results was analyzed by paired two tailed t-tests and non-parametric Wilcoxon signed-rank tests. We observed overall that B-LR with non-informative Gaussian priors performed on par with other classifiers in terms of accuracy, balanced accuracy and AUC. These results suggest that it is worthwhile to explore the application of B-LR to predictive modeling tasks in bioinformatics using informative biological prior probabilities. With informative prior probabilities, we conjecture that the performance of B-LR will improve.
Collapse
|
40
|
A comparative analysis of methods for predicting clinical outcomes using high-dimensional genomic datasets. J Am Med Inform Assoc 2014; 21:e312-9. [PMID: 24737607 PMCID: PMC4173174 DOI: 10.1136/amiajnl-2013-002358] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2013] [Revised: 02/20/2014] [Accepted: 03/14/2014] [Indexed: 11/03/2022] Open
Abstract
OBJECTIVE The objective of this investigation is to evaluate binary prediction methods for predicting disease status using high-dimensional genomic data. The central hypothesis is that the Bayesian network (BN)-based method called efficient Bayesian multivariate classifier (EBMC) will do well at this task because EBMC builds on BN-based methods that have performed well at learning epistatic interactions. METHOD We evaluate how well eight methods perform binary prediction using high-dimensional discrete genomic datasets containing epistatic interactions. The methods are as follows: naive Bayes (NB), model averaging NB (MANB), feature selection NB (FSNB), EBMC, logistic regression (LR), support vector machines (SVM), Lasso, and extreme learning machines (ELM). We use a hundred 1000-single nucleotide polymorphism (SNP) simulated datasets, ten 10,000-SNP datasets, six semi-synthetic sets, and two real genome-wide association studies (GWAS) datasets in our evaluation. RESULTS In fivefold cross-validation studies, the SVM performed best on the 1000-SNP dataset, while the BN-based methods performed best on the other datasets, with EBMC exhibiting the best overall performance. In-sample testing indicates that LR, SVM, Lasso, ELM, and NB tend to overfit the data. DISCUSSION EBMC performed better than NB when there are several strong predictors, whereas NB performed better when there are many weak predictors. Furthermore, for all BN-based methods, prediction capability did not degrade as the dimension increased. CONCLUSIONS Our results support the hypothesis that EBMC performs well at binary outcome prediction using high-dimensional discrete datasets containing epistatic-like interactions. Future research using more GWAS datasets is needed to further investigate the potential of EBMC.
Collapse
|
41
|
A method for detecting and characterizing outbreaks of infectious disease from clinical reports. J Biomed Inform 2014; 53:15-26. [PMID: 25181466 DOI: 10.1016/j.jbi.2014.08.011] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2014] [Revised: 08/04/2014] [Accepted: 08/22/2014] [Indexed: 11/30/2022]
Abstract
Outbreaks of infectious disease can pose a significant threat to human health. Thus, detecting and characterizing outbreaks quickly and accurately remains an important problem. This paper describes a Bayesian framework that links clinical diagnosis of individuals in a population to epidemiological modeling of disease outbreaks in the population. Computer-based diagnosis of individuals who seek healthcare is used to guide the search for epidemiological models of population disease that explain the pattern of diagnoses well. We applied this framework to develop a system that detects influenza outbreaks from emergency department (ED) reports. The system diagnoses influenza in individuals probabilistically from evidence in ED reports that are extracted using natural language processing. These diagnoses guide the search for epidemiological models of influenza that explain the pattern of diagnoses well. Those epidemiological models with a high posterior probability determine the most likely outbreaks of specific diseases; the models are also used to characterize properties of an outbreak, such as its expected peak day and estimated size. We evaluated the method using both simulated data and data from a real influenza outbreak. The results provide support that the approach can detect and characterize outbreaks early and well enough to be valuable. We describe several extensions to the approach that appear promising.
Collapse
|
42
|
Selective model averaging with bayesian rule learning for predictive biomedicine. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2014; 2014:17-22. [PMID: 25717394 PMCID: PMC4333697] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Accurate disease classification and biomarker discovery remain challenging tasks in biomedicine. In this paper, we develop and test a practical approach to combining evidence from multiple models when making predictions using selective Bayesian model averaging of probabilistic rules. This method is implemented within a Bayesian Rule Learning system and compared to model selection when applied to twelve biomedical datasets using the area under the ROC curve measure of performance. Cross-validation results indicate that selective Bayesian model averaging statistically significantly outperforms model selection on average in these experiments, suggesting that combining predictions from multiple models may lead to more accurate quantification of classifier uncertainty. This approach would directly impact the generation of robust predictions on unseen test data, while also increasing knowledge for biomarker discovery and mechanisms that underlie disease.
Collapse
|
43
|
Decision path models for patient-specific modeling of patient outcomes. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2013; 2013:413-421. [PMID: 24551347 PMCID: PMC3900188] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Patient-specific models are constructed to take advantage of the particular features of the patient case of interest compared to commonly used population-wide models that are constructed to perform well on average on all cases. We introduce two patient-specific algorithms that are based on the decision tree paradigm. These algorithms construct a decision path specific for each patient of interest compared to a single population-wide decision tree with many paths that is applicable to all patients of interest that are constructed by standard algorithms. We applied the patient-specific algorithms to predict five different outcomes in clinical datasets. Compared to the population-wide CART decision tree the patient-specific decision path models had superior performance on area under the ROC curve (AUC) and had comparable performance on balanced accuracy. Our results provide support for patient-specific algorithms being a promising approach for predicting clinical outcomes.
Collapse
|
44
|
Distinct signaling roles of ceramide species in yeast revealed through systematic perturbation and systems biology analyses. Sci Signal 2013; 6:rs14. [PMID: 24170935 PMCID: PMC3974757 DOI: 10.1126/scisignal.2004515] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Ceramide, the central molecule of sphingolipid metabolism, is an important bioactive molecule that participates in various cellular regulatory events and that has been implicated in disease. Deciphering ceramide signaling is challenging because multiple ceramide species exist, and many of them may have distinct functions. We applied systems biology and molecular approaches to perturb ceramide metabolism in the yeast Saccharomyces cerevisiae and inferred causal relationships between ceramide species and their potential targets by combining lipidomic, genomic, and transcriptomic analyses. We found that during heat stress, distinct metabolic mechanisms controlled the abundance of different groups of ceramide species and provided experimental support for the importance of the dihydroceramidase Ydc1 in mediating the decrease in dihydroceramides during heat stress. Additionally, distinct groups of ceramide species, with different N-acyl chains and hydroxylations, regulated different sets of functionally related genes, indicating that the structural complexity of these lipids produces functional diversity. The transcriptional modules that we identified provide a resource to begin to dissect the specific functions of ceramides.
Collapse
|
45
|
A Temporal Pattern Mining Approach for Classifying Electronic Health Record Data. ACM T INTEL SYST TEC 2013; 4:10.1145/2508037.2508044. [PMID: 25309815 PMCID: PMC4192602 DOI: 10.1145/2508037.2508044] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2011] [Accepted: 08/01/2013] [Indexed: 10/26/2022]
Abstract
We study the problem of learning classification models from complex multivariate temporal data encountered in electronic health record systems. The challenge is to define a good set of features that are able to represent well the temporal aspect of the data. Our method relies on temporal abstractions and temporal pattern mining to extract the classification features. Temporal pattern mining usually returns a large number of temporal patterns, most of which may be irrelevant to the classification task. To address this problem, we present the Minimal Predictive Temporal Patterns framework to generate a small set of predictive and non-spurious patterns. We apply our approach to the real-world clinical task of predicting patients who are at risk of developing heparin induced thrombocytopenia. The results demonstrate the benefit of our approach in efficiently learning accurate classifiers, which is a key step for developing intelligent clinical monitoring systems.
Collapse
|
46
|
Detection of Patients with Influenza Syndrome Using Machine-Learning Models Learned from Emergency Department Reports. Online J Public Health Inform 2013. [PMCID: PMC3692886] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
Objective Compare 7 machine learning algorithms with an expert constructed Bayesian network on detection of patients with influenza syndrome. Introduction Early detection of influenza outbreaks is critical to public health officials. Case detection is the foundation for outbreak detection. Previous study by Elkin el al. demonstrated that using individual emergency department (ED) reports can better detect influenza cases than using chief complaints [1]. Our recent study using ED reports processed by Bayesian networks (using expert constructed network structure) showed high detection accuracy on detection of influenza cases [2]. Methods The dataset used in this study includes 182 ED reports with confirmed PCR influenza tests (Jan 1, 2007–Dec 31, 2009) and 40853 ED reports as control cases from 8 EDs in UPMC (Jul 1, 2010–Aug 31, 2010). All ED reports were deidentified by De-ID software with IRB approval. An NLP system, Topaz, was used to extract relevant findings and symptoms from the reports and encoded them with the UMLS concept unique identifier codes [2]. Two subsets were created: DS1-train (67% of cases) and DS1-test (remaining 33%). The algorithms used for training the models are: Naïve Bayes Classifier, Efficient Bayesian Multivariate Classification (EBMC) [3], Bayesian Network with K2 algorithm, Logistic Regression (LR), Support Vector Machine (SVM), Artificial Neural Networks (ANN) and Random Forest (RF). The predictive performance of each method was evaluated using the area under the receiver operator characteristic (AUROC) and the Hosmer-Lemeshow (HL) statistical significance testing, that describes the lack-of-fit of the model to the dataset. Results The evaluation results of all the models using DS1-test, including the AUROC, its confidence interval, p-value (between each algorithm and the expert) and the calibration with HL are shown in Table 1. Conclusions All models achieved high AUROC values. The pairwise comparison of p-values in Table 1 demonstrates that the AUROCs of all the machine-learning models and the expert model were not significantly different. Nevertheless, EBMC is the best fitted. The model created by EBMC is shown in Figure 1. One limitation of the study is that the test dataset has low influenza prevalence, which may bias the detection algorithm performance. We are in the process of testing the algorithms using higher prevalence rate. The same process could also be applied to other diseases to further research the generalizability of our method.
Collapse
|
47
|
Outlier detection for patient monitoring and alerting. J Biomed Inform 2013; 46:47-55. [PMID: 22944172 PMCID: PMC3567774 DOI: 10.1016/j.jbi.2012.08.004] [Citation(s) in RCA: 47] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2012] [Revised: 08/14/2012] [Accepted: 08/14/2012] [Indexed: 01/31/2023]
Abstract
We develop and evaluate a data-driven approach for detecting unusual (anomalous) patient-management decisions using past patient cases stored in electronic health records (EHRs). Our hypothesis is that a patient-management decision that is unusual with respect to past patient care may be due to an error and that it is worthwhile to generate an alert if such a decision is encountered. We evaluate this hypothesis using data obtained from EHRs of 4486 post-cardiac surgical patients and a subset of 222 alerts generated from the data. We base the evaluation on the opinions of a panel of experts. The results of the study support our hypothesis that the outlier-based alerting can lead to promising true alert rates. We observed true alert rates that ranged from 25% to 66% for a variety of patient-management actions, with 66% corresponding to the strongest outliers.
Collapse
|
48
|
Improving the prediction of clinical outcomes from genomic data using multiresolution analysis. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2012; 9:1442-1450. [PMID: 22641708 DOI: 10.1109/tcbb.2012.80] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
The prediction of patient's future clinical outcome, such as Alzheimer's and cardiac disease, using only genomic information is an open problem. In cases when genome-wide association studies (GWASs) are able to find strong associations between genomic predictors (e.g., SNPs) and disease, pattern recognition methods may be able to predict the disease well. Furthermore, by using signal processing methods, we can capitalize on latent multivariate interactions of genomic predictors. Such an approach to genomic pattern recognition for prediction of clinical outcomes is investigated in this work. In particular, we show how multiresolution transforms can be applied to genomic data to extract cues of multivariate interactions and, in some cases, improve on the predictive performance of clinical outcomes of standard classification methods. Our results show, for example, that an improvement of about 6 percent increase of the area under the ROC curve can be achieved using multiresolution spaces to train logistic regression to predict late-onset Alzheimer's disease (LOAD) compared to logistic regression applied directly on SNP data.
Collapse
|
49
|
Multivariate Bayesian modeling of known and unknown causes of events--an application to biosurveillance. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2012; 107:436-446. [PMID: 21195503 DOI: 10.1016/j.cmpb.2010.11.015] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/24/2010] [Revised: 11/29/2010] [Accepted: 11/30/2010] [Indexed: 05/30/2023]
Abstract
This paper investigates Bayesian modeling of known and unknown causes of events in the context of disease-outbreak detection. We introduce a multivariate Bayesian approach that models multiple evidential features of every person in the population. This approach models and detects (1) known diseases (e.g., influenza and anthrax) by using informative prior probabilities and (2) unknown diseases (e.g., a new, highly contagious respiratory virus that has never been seen before) by using relatively non-informative prior probabilities. We report the results of simulation experiments which support that this modeling method can improve the detection of new disease outbreaks in a population. A contribution of this paper is that it introduces a multivariate Bayesian approach for jointly modeling both known and unknown causes of events. Such modeling has general applicability in domains where the space of known causes is incomplete.
Collapse
|
50
|
Spatial cluster detection using dynamic programming. BMC Med Inform Decis Mak 2012; 12:22. [PMID: 22443103 PMCID: PMC3403878 DOI: 10.1186/1472-6947-12-22] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2011] [Accepted: 03/25/2012] [Indexed: 01/04/2023] Open
Abstract
Background The task of spatial cluster detection involves finding spatial regions where some property deviates from the norm or the expected value. In a probabilistic setting this task can be expressed as finding a region where some event is significantly more likely than usual. Spatial cluster detection is of interest in fields such as biosurveillance, mining of astronomical data, military surveillance, and analysis of fMRI images. In almost all such applications we are interested both in the question of whether a cluster exists in the data, and if it exists, we are interested in finding the most accurate characterization of the cluster. Methods We present a general dynamic programming algorithm for grid-based spatial cluster detection. The algorithm can be used for both Bayesian maximum a-posteriori (MAP) estimation of the most likely spatial distribution of clusters and Bayesian model averaging over a large space of spatial cluster distributions to compute the posterior probability of an unusual spatial clustering. The algorithm is explained and evaluated in the context of a biosurveillance application, specifically the detection and identification of Influenza outbreaks based on emergency department visits. A relatively simple underlying model is constructed for the purpose of evaluating the algorithm, and the algorithm is evaluated using the model and semi-synthetic test data. Results When compared to baseline methods, tests indicate that the new algorithm can improve MAP estimates under certain conditions: the greedy algorithm we compared our method to was found to be more sensitive to smaller outbreaks, while as the size of the outbreaks increases, in terms of area affected and proportion of individuals affected, our method overtakes the greedy algorithm in spatial precision and recall. The new algorithm performs on-par with baseline methods in the task of Bayesian model averaging. Conclusions We conclude that the dynamic programming algorithm performs on-par with other available methods for spatial cluster detection and point to its low computational cost and extendability as advantages in favor of further research and use of the algorithm.
Collapse
|