1
|
Synthetic data in health care: A narrative review. PLOS DIGITAL HEALTH 2023; 2:e0000082. [PMID: 36812604 PMCID: PMC9931305 DOI: 10.1371/journal.pdig.0000082] [Citation(s) in RCA: 21] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Accepted: 12/06/2022] [Indexed: 01/09/2023]
Abstract
Data are central to research, public health, and in developing health information technology (IT) systems. Nevertheless, access to most data in health care is tightly controlled, which may limit innovation, development, and efficient implementation of new research, products, services, or systems. Using synthetic data is one of the many innovative ways that can allow organizations to share datasets with broader users. However, only a limited set of literature is available that explores its potentials and applications in health care. In this review paper, we examined existing literature to bridge the gap and highlight the utility of synthetic data in health care. We searched PubMed, Scopus, and Google Scholar to identify peer-reviewed articles, conference papers, reports, and thesis/dissertations articles related to the generation and use of synthetic datasets in health care. The review identified seven use cases of synthetic data in health care: a) simulation and prediction research, b) hypothesis, methods, and algorithm testing, c) epidemiology/public health research, d) health IT development, e) education and training, f) public release of datasets, and g) linking data. The review also identified readily and publicly accessible health care datasets, databases, and sandboxes containing synthetic data with varying degrees of utility for research, education, and software development. The review provided evidence that synthetic data are helpful in different aspects of health care and research. While the original real data remains the preferred choice, synthetic data hold possibilities in bridging data access gaps in research and evidence-based policymaking.
Collapse
|
2
|
Cavallaro M, Coelho J, Ready D, Decraene V, Lamagni T, McCarthy ND, Todkill D, Keeling MJ. Cluster detection with random neighbourhood covering: Application to invasive Group A Streptococcal disease. PLoS Comput Biol 2022; 18:e1010726. [PMID: 36449515 PMCID: PMC9744322 DOI: 10.1371/journal.pcbi.1010726] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2021] [Revised: 12/12/2022] [Accepted: 11/10/2022] [Indexed: 12/02/2022] Open
Abstract
The rapid detection of outbreaks is a key step in the effective control and containment of infectious diseases. In particular, the identification of cases which might be epidemiologically linked is crucial in directing outbreak-containment efforts and shaping the intervention of public health authorities. Often this requires the detection of clusters of cases whose numbers exceed those expected by a background of sporadic cases. Quantifying exceedances rapidly is particularly challenging when only few cases are typically reported in a precise location and time. To address such important public health concerns, we present a general method which can detect spatio-temporal deviations from a Poisson point process and estimate the odds of an isolate being part of a cluster. This method can be applied to diseases where detailed geographical information is available. In addition, we propose an approach to explicitly take account of delays in microbial typing. As a case study, we considered invasive group A Streptococcus infection events as recorded and typed by Public Health England from 2015 to 2020.
Collapse
Affiliation(s)
- Massimo Cavallaro
- The Zeeman Institute for Systems Biology & Infectious Disease Epidemiology Research, University of Warwick, Coventry, United Kingdom
- School of Life Sciences and Mathematics Institute, University of Warwick, Coventry, United Kingdom
- UK Health Security Agency, United Kingdom
| | | | - Derren Ready
- UK Health Security Agency, United Kingdom
- Health Protection Research Unit in Behavioural Science and Evaluation at the University of Bristol, Bristol, United Kingdom
| | | | | | - Noel D. McCarthy
- The Zeeman Institute for Systems Biology & Infectious Disease Epidemiology Research, University of Warwick, Coventry, United Kingdom
- Warwick Medical School, University of Warwick, Coventry, United Kingdom
- Institute of Population Health, School of Medicine, Trinity College Dublin, University of Dublin, 2 Dublin, Ireland
| | - Dan Todkill
- UK Health Security Agency, United Kingdom
- Warwick Medical School, University of Warwick, Coventry, United Kingdom
| | - Matt J. Keeling
- The Zeeman Institute for Systems Biology & Infectious Disease Epidemiology Research, University of Warwick, Coventry, United Kingdom
- School of Life Sciences and Mathematics Institute, University of Warwick, Coventry, United Kingdom
| |
Collapse
|
3
|
Fioriti V, Chinnici M, Arbore A, Sigismondi N, Roselli I. Estimating the epidemic growth dynamics within the first week. Heliyon 2021; 7:e08422. [PMID: 34816052 PMCID: PMC8600919 DOI: 10.1016/j.heliyon.2021.e08422] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2021] [Revised: 10/22/2021] [Accepted: 11/15/2021] [Indexed: 11/20/2022] Open
Abstract
Information about the early growth of infectious outbreaks is indispensable to estimate the epidemic spreading. A large number of mathematical tools have been developed to this end, facing as much large number of different dynamic evolutions, ranging from sub-linear to super-exponential growth. Of course, the crucial point is that we do not have enough data during the initial outbreak phase to make reliable inferences. Here we propose a straightforward methodology to estimate the epidemic growth dynamic from the cumulative infected data of just a week, provided a surveillance system is available over the whole territory. The methodology, based on the Newcomb-Benford Law, is applied to the Italian covid 19 case-study. Results show that it is possible to discriminate the epidemic dynamics using the first seven data points collected in fifty Italian cities. Moreover, the most probable approximating function of the growth within a six-week epidemic scenario is identified.
Collapse
Affiliation(s)
| | - Marta Chinnici
- ENEA- C.R Casaccia, Via Anguillarese 301, Rome, 00123, Italy
| | | | | | - Ivan Roselli
- ENEA- C.R Casaccia, Via Anguillarese 301, Rome, 00123, Italy
| |
Collapse
|
4
|
Noufaily A, Morbey RA, Colón-González FJ, Elliot AJ, Smith GE, Lake IR, McCarthy N. Comparison of statistical algorithms for daily syndromic surveillance aberration detection. Bioinformatics 2020; 35:3110-3118. [PMID: 30689731 PMCID: PMC6736430 DOI: 10.1093/bioinformatics/bty997] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2018] [Revised: 11/16/2018] [Accepted: 01/22/2019] [Indexed: 11/29/2022] Open
Abstract
Motivation Public health authorities can provide more effective and timely interventions to protect populations during health events if they have effective multi-purpose surveillance systems. These systems rely on aberration detection algorithms to identify potential threats within large datasets. Ensuring the algorithms are sensitive, specific and timely is crucial for protecting public health. Here, we evaluate the performance of three detection algorithms extensively used for syndromic surveillance: the ‘rising activity, multilevel mixed effects, indicator emphasis’ (RAMMIE) method and the improved quasi-Poisson regression-based method known as ‘Farrington Flexible’ both currently used at Public Health England, and the ‘Early Aberration Reporting System’ (EARS) method used at the US Centre for Disease Control and Prevention. We model the wide range of data structures encountered within the daily syndromic surveillance systems used by PHE. We undertake extensive simulations to identify which algorithms work best across different types of syndromes and different outbreak sizes. We evaluate RAMMIE for the first time since its introduction. Performance metrics were computed and compared in the presence of a range of simulated outbreak types that were added to baseline data. Results We conclude that amongst the algorithm variants that have a high specificity (i.e. >90%), Farrington Flexible has the highest sensitivity and specificity, whereas RAMMIE has the highest probability of outbreak detection and is the most timely, typically detecting outbreaks 2–3 days earlier. Availability and implementation R codes developed for this project are available through https://github.com/FelipeJColon/AlgorithmComparison Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Angela Noufaily
- Statistics and Epidemiology, Warwick Medical School, University of Warwick, Coventry, UK
| | - Roger A Morbey
- Real-time Syndromic Surveillance Team, National Infection Service, Public Health England, Birmingham, UK
| | | | - Alex J Elliot
- Real-time Syndromic Surveillance Team, National Infection Service, Public Health England, Birmingham, UK
| | - Gillian E Smith
- Real-time Syndromic Surveillance Team, National Infection Service, Public Health England, Birmingham, UK
| | - Iain R Lake
- School of Environmental Sciences, University of East Anglia, Norwich, UK
| | - Noel McCarthy
- Population Evidence and Technologies, Warwick Medical School, University of Warwick, Coventry, UK
| |
Collapse
|
5
|
Yuan M, Boston-Fisher N, Luo Y, Verma A, Buckeridge DL. A systematic review of aberration detection algorithms used in public health surveillance. J Biomed Inform 2019; 94:103181. [PMID: 31014979 DOI: 10.1016/j.jbi.2019.103181] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2018] [Revised: 04/16/2019] [Accepted: 04/17/2019] [Indexed: 12/21/2022]
Abstract
The algorithms used for detecting anomalies have evolved substantially over the last decade to take advantage of advances in informatics and to accommodate changes in surveillance data. We identified 145 studies since 2007 that evaluated statistical methods used to detect aberrations in public health surveillance data. For each study, we classified the analytic methods and reviewed the evaluation metrics. We also summarized the practical usage of the detection algorithms in public health surveillance systems worldwide. Traditional methods (e.g., control charts, linear regressions) were the focus of most evaluation studies and continue to be used commonly in practice. There was, however, an increase in the number of studies using forecasting methods and studies applying machine learning methods, hidden Markov models, and Bayesian framework to multivariate datasets. Evaluation studies demonstrated improved accuracy with more sophisticated methods, but these methods do not appear to be used widely in public health practice.
Collapse
Affiliation(s)
- Mengru Yuan
- Clinical and Health Informatics Research Group, McGill University, 1140 Pine Avenue West, Montreal, QC H3A 1A3, Canada
| | - Nikita Boston-Fisher
- Clinical and Health Informatics Research Group, McGill University, 1140 Pine Avenue West, Montreal, QC H3A 1A3, Canada
| | - Yu Luo
- Clinical and Health Informatics Research Group, McGill University, 1140 Pine Avenue West, Montreal, QC H3A 1A3, Canada
| | - Aman Verma
- Clinical and Health Informatics Research Group, McGill University, 1140 Pine Avenue West, Montreal, QC H3A 1A3, Canada
| | - David L Buckeridge
- Clinical and Health Informatics Research Group, McGill University, 1140 Pine Avenue West, Montreal, QC H3A 1A3, Canada.
| |
Collapse
|
6
|
Texier G, Allodji RS, Diop L, Meynard JB, Pellegrin L, Chaudet H. Using decision fusion methods to improve outbreak detection in disease surveillance. BMC Med Inform Decis Mak 2019; 19:38. [PMID: 30837003 PMCID: PMC6402142 DOI: 10.1186/s12911-019-0774-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2018] [Accepted: 02/18/2019] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND When outbreak detection algorithms (ODAs) are considered individually, the task of outbreak detection can be seen as a classification problem and the ODA as a sensor providing a binary decision (outbreak yes or no) for each day of surveillance. When they are considered jointly (in cases where several ODAs analyze the same surveillance signal), the outbreak detection problem should be treated as a decision fusion (DF) problem of multiple sensors. METHODS This study evaluated the benefit for a decisions support system of using DF methods (fusing multiple ODA decisions) compared to using a single method of outbreak detection. For each day, we merged the decisions of six ODAs using 5 DF methods (two voting methods, logistic regression, CART and Bayesian network - BN). Classical metrics of accuracy, prediction and timelines were used during the evaluation steps. RESULTS In our results, we observed the greatest gain (77%) in positive predictive value compared to the best ODA if we used DF methods with a learning step (BN, logistic regression, and CART). CONCLUSIONS To identify disease outbreaks in systems using several ODAs to analyze surveillance data, we recommend using a DF method based on a Bayesian network. This method is at least equivalent to the best of the algorithms considered, regardless of the situation faced by the system. For those less familiar with this kind of technique, we propose that logistic regression be used when a training dataset is available.
Collapse
Affiliation(s)
- Gaëtan Texier
- French Armed Forces Center for Epidemiology and Public Health (CESPA), SSA, Camp de Sainte Marthe, 13568, Marseille, France. .,UMR VITROME, IRD, AP-HM, SSA, IHU-Méditerranée Infection, Aix Marseille Univ, 13005, Marseille, France.
| | - Rodrigue S Allodji
- French Armed Forces Center for Epidemiology and Public Health (CESPA), SSA, Camp de Sainte Marthe, 13568, Marseille, France.,CESP, Univ. Paris-Sud, UVSQ, INSERM, Université Paris-Saclay, Villejuif, France.,Cancer and Radiation Team, Gustave Roussy Cancer Center, F-94805, Villejuif, France
| | - Loty Diop
- International Food Policy Research Institute (IFPRI), Regional Office for West and Central Africa Regional Office, 24063, Dakar, Sénégal
| | - Jean-Baptiste Meynard
- French Armed Forces Center for Epidemiology and Public Health (CESPA), SSA, Camp de Sainte Marthe, 13568, Marseille, France.,UMR 912 - SESSTIM - INSERM/IRD/Aix-Marseille Université, 13385, Marseille, France
| | - Liliane Pellegrin
- French Armed Forces Center for Epidemiology and Public Health (CESPA), SSA, Camp de Sainte Marthe, 13568, Marseille, France.,UMR VITROME, IRD, AP-HM, SSA, IHU-Méditerranée Infection, Aix Marseille Univ, 13005, Marseille, France
| | - Hervé Chaudet
- French Armed Forces Center for Epidemiology and Public Health (CESPA), SSA, Camp de Sainte Marthe, 13568, Marseille, France.,UMR VITROME, IRD, AP-HM, SSA, IHU-Méditerranée Infection, Aix Marseille Univ, 13005, Marseille, France
| |
Collapse
|
7
|
Guan J, Chen K, Jee AY, Granick S. DNA molecules deviate from shortest trajectory when driven through hydrogel. J Chem Phys 2018; 149:163331. [DOI: 10.1063/1.5033990] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Affiliation(s)
- Juan Guan
- Center for Soft and Living Matter, Institute for Basic Science (IBS), Ulsan 44919, South Korea
- Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, California 94143, USA
| | - Kejia Chen
- Google, Inc., Mountain View, California 94043, USA
| | - Ah-Young Jee
- Center for Soft and Living Matter, Institute for Basic Science (IBS), Ulsan 44919, South Korea
| | - Steve Granick
- Center for Soft and Living Matter, Institute for Basic Science (IBS), Ulsan 44919, South Korea
- Department of Chemistry, Ulsan National Institute of Science and Technology (UNIST), Ulsan 44919, South Korea
| |
Collapse
|