1
|
Domegan L, Garvey P, McKeown P, Johnson H, Hynds P, O'Dwyer J, ÓhAiseadha C. Geocoding cryptosporidiosis cases in Ireland (2008-2017)-development of a reliable, reproducible, multiphase geocoding methodology. Ir J Med Sci 2021; 190:1497-1507. [PMID: 33464478 PMCID: PMC7813664 DOI: 10.1007/s11845-020-02468-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2020] [Accepted: 12/03/2020] [Indexed: 02/07/2023]
Abstract
Background Geocoding (the process of converting a text address into spatial data) quality may affect geospatial epidemiological study findings. No national standards for best geocoding practice exist in Ireland. Irish postcodes (Eircodes) are not routinely recorded for infectious disease notifications and > 35% of dwellings have non-unique addresses. This may result in incomplete geocoding and introduce systematic errors into studies. Aims This study aimed to develop a reliable and reproducible methodology to geocode cryptosporidiosis notifications to fine-resolution spatial units (Census 2016 Small Areas), to enhance data validity and completeness, thus improving geospatial epidemiological studies. Methods A protocol was devised to utilise geocoding tools developed by the Health Service Executive’s Health Intelligence Unit. Geocoding employed finite-string automated and manual matching, undertaken sequentially in three additive phases. The protocol was applied to a cryptosporidiosis notification dataset (2008–2017) from Ireland’s Computerised Infectious Disease Reporting System. Outputs were validated against devised criteria. Results Overall, 92.1% (4266/4633) of cases were successfully geocoded to one Small Area, and 95.5% (n = 4425) to larger spatial units. The proportion of records geocoded increased by 14% using the multiphase approach, with 5% of records re-assigned to a different spatial unit. Conclusions The developed multiphase protocol improved the completeness and validity of geocoding, thus increasing the power of subsequent studies. The authors recommend capturing Eircodes ideally using application programming interface for infectious disease or other health-related datasets, for more efficient and reliable geocoding. Where Eircodes are not recorded/available, for best geocoding practice, we recommend this (or a similar) quality driven protocol.
Collapse
Affiliation(s)
- Lisa Domegan
- European Programme for Intervention Epidemiology Training (EPIET), European Centre for Disease Prevention and Control (ECDC), Stockholm, Sweden. .,Health Service Executive-Health Protection Surveillance Centre, Dublin, Ireland.
| | - Patricia Garvey
- Health Service Executive-Health Protection Surveillance Centre, Dublin, Ireland
| | - Paul McKeown
- Health Service Executive-Health Protection Surveillance Centre, Dublin, Ireland
| | - Howard Johnson
- Health Service Executive-Health Intelligence Unit, Dublin, Ireland
| | - Paul Hynds
- Environmental Sustainability & Health Institute, Technological University Dublin, Dublin, Ireland.,Irish Centre for Research in Applied Geosciences, University College Dublin, Dublin, Ireland
| | - Jean O'Dwyer
- Irish Centre for Research in Applied Geosciences, University College Dublin, Dublin, Ireland.,School of Biological, Earth and Environmental Sciences, University College Cork, Cork, Ireland.,Water and Environment Research Group, Environmental Research Institute, University College Cork, Cork, Ireland
| | - Coilín ÓhAiseadha
- Health Service Executive-Department of Public Health-East, Dublin, Ireland.
| |
Collapse
|
2
|
Abstract
Surveillance is critical for improving population health. Public health surveillance systems generate information that drives action, and the data must be of sufficient quality and with a resolution and timeliness that matches objectives. In the context of scientific advances in public health surveillance, changing health care and public health environments, and rapidly evolving technologies, the aim of this article is to review public health surveillance systems. We consider their current use to increase the efficiency and effectiveness of the public health system, the role of system stakeholders, the analysis and interpretation of surveillance data, approaches to system monitoring and evaluation, and opportunities for future advances in terms of increased scientific rigor, outcomes-focused research, and health informatics.
Collapse
Affiliation(s)
- Samuel L. Groseclose
- Office of Public Health Preparedness and Response, Centers for Disease Control and Prevention, Atlanta, Georgia 30329
| | - David L. Buckeridge
- Surveillance Lab, McGill Clinical and Health Informatics, Department of Epidemiology, Biostatistics, and Occupational Health, McGill University, Montreal, Quebec, Canada H3A 1A3
| |
Collapse
|
3
|
Klaus CA, Carrasco LE, Goldberg DW, Henry KA, Sherman RL. Use of attribute association error probability estimates to evaluate quality of medical record geocodes. Int J Health Geogr 2015; 14:26. [PMID: 26370237 PMCID: PMC4570180 DOI: 10.1186/s12942-015-0019-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2015] [Accepted: 08/26/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The utility of patient attributes associated with the spatiotemporal analysis of medical records lies not just in their values but also the strength of association between them. Estimating the extent to which a hierarchy of conditional probability exists between patient attribute associations such as patient identifying fields, patient and date of diagnosis, and patient and address at diagnosis is fundamental to estimating the strength of association between patient and geocode, and patient and enumeration area. We propose a hierarchy for the attribute associations within medical records that enable spatiotemporal relationships. We also present a set of metrics that store attribute association error probability (AAEP), to estimate error probability for all attribute associations upon which certainty in a patient geocode depends. METHODS A series of experiments were undertaken to understand how error estimation could be operationalized within health data and what levels of AAEP in real data reveal themselves using these methods. Specifically, the goals of this evaluation were to (1) assess if the concept of our error assessment techniques could be implemented by a population-based cancer registry; (2) apply the techniques to real data from a large health data agency and characterize the observed levels of AAEP; and (3) demonstrate how detected AAEP might impact spatiotemporal health research. RESULTS We present an evaluation of AAEP metrics generated for cancer cases in a North Carolina county. We show examples of how we estimated AAEP for selected attribute associations and circumstances. We demonstrate the distribution of AAEP in our case sample across attribute associations, and demonstrate ways in which disease registry specific operations influence the prevalence of AAEP estimates for specific attribute associations. CONCLUSIONS The effort to detect and store estimates of AAEP is worthwhile because of the increase in confidence fostered by the attribute association level approach to the assessment of uncertainty in patient geocodes, relative to existing geocoding related uncertainty metrics.
Collapse
Affiliation(s)
| | - Luis E Carrasco
- North Carolina Center for Geographic Information and Analysis, Raleigh, NC, USA.
| | - Daniel W Goldberg
- Department of Geography, Texas A&M University, College Station, TX, USA.
- Department of Computer Science & Engineering, Texas A&M University, College Station, TX, USA.
| | - Kevin A Henry
- Department of Geography and Urban Studies, Temple University, Philadelphia, PA, USA.
| | - Recinda L Sherman
- North American Association of Central Cancer Registries, Springfield, IL, USA.
| |
Collapse
|
4
|
Warren JL, Perez-Heydrich C, Burgert CR, Emch ME. Influence of Demographic and Health Survey Point Displacements on Distance-Based Analyses. Spat Demogr 2016; 4:155-73. [PMID: 27453935 DOI: 10.1007/s40980-015-0014-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
We evaluate the impacts of random spatial displacements on analyses that involve distance measures from displaced Demographic and Health Survey (DHS) clusters to nearest ancillary point or line features, such as health resources or roads. We use simulation and case studies to address the effects of this introduced error, and propose use of regression calibration (RC) to reduce its impact. Results suggest that RC outperforms analyses involving naive distance-based covariate assignments by reducing the bias and MSE of the main estimator in most settings. Proposed guidelines also address the effect of the spatial density of destination features on observed bias.
Collapse
|
5
|
Abstract
Datasets of gigabyte size are common in medical sciences. There is increasing consensus that significant untapped knowledge lies hidden in these large datasets. This review article aims to discuss Electronic Health-Related Datasets (EHRDs) in terms of types, features, advantages, limitations, and possible use in nursing and health-related research. Major scientific databases, MEDLINE, ScienceDirect, and Scopus, were searched for studies or review articles regarding using EHRDs in research. A total number of 442 articles were located. After application of study inclusion criteria, 113 articles were included in the final review. EHRDs were categorized into Electronic Administrative Health-Related Datasets and Electronic Clinical Health-Related Datasets. Subcategories of each major category were identified. EHRDs are invaluable assets for nursing the health-related research. Advanced research skills such as using analytical softwares, advanced statistical procedures, dealing with missing data and missing variables will maximize the efficient utilization of EHRDs in research.
Collapse
|
6
|
Carroll LN, Au AP, Detwiler LT, Fu TC, Painter IS, Abernethy NF. Visualization and analytics tools for infectious disease epidemiology: a systematic review. J Biomed Inform 2014; 51:287-98. [PMID: 24747356 PMCID: PMC5734643 DOI: 10.1016/j.jbi.2014.04.006] [Citation(s) in RCA: 136] [Impact Index Per Article: 13.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2013] [Revised: 03/13/2014] [Accepted: 04/03/2014] [Indexed: 12/31/2022]
Abstract
BACKGROUND A myriad of new tools and algorithms have been developed to help public health professionals analyze and visualize the complex data used in infectious disease control. To better understand approaches to meet these users' information needs, we conducted a systematic literature review focused on the landscape of infectious disease visualization tools for public health professionals, with a special emphasis on geographic information systems (GIS), molecular epidemiology, and social network analysis. The objectives of this review are to: (1) identify public health user needs and preferences for infectious disease information visualization tools; (2) identify existing infectious disease information visualization tools and characterize their architecture and features; (3) identify commonalities among approaches applied to different data types; and (4) describe tool usability evaluation efforts and barriers to the adoption of such tools. METHODS We identified articles published in English from January 1, 1980 to June 30, 2013 from five bibliographic databases. Articles with a primary focus on infectious disease visualization tools, needs of public health users, or usability of information visualizations were included in the review. RESULTS A total of 88 articles met our inclusion criteria. Users were found to have diverse needs, preferences and uses for infectious disease visualization tools, and the existing tools are correspondingly diverse. The architecture of the tools was inconsistently described, and few tools in the review discussed the incorporation of usability studies or plans for dissemination. Many studies identified concerns regarding data sharing, confidentiality and quality. Existing tools offer a range of features and functions that allow users to explore, analyze, and visualize their data, but the tools are often for siloed applications. Commonly cited barriers to widespread adoption included lack of organizational support, access issues, and misconceptions about tool use. DISCUSSION AND CONCLUSION As the volume and complexity of infectious disease data increases, public health professionals must synthesize highly disparate data to facilitate communication with the public and inform decisions regarding measures to protect the public's health. Our review identified several themes: consideration of users' needs, preferences, and computer literacy; integration of tools into routine workflow; complications associated with understanding and use of visualizations; and the role of user trust and organizational support in the adoption of these tools. Interoperability also emerged as a prominent theme, highlighting challenges associated with the increasingly collaborative and interdisciplinary nature of infectious disease control and prevention. Future work should address methods for representing uncertainty and missing data to avoid misleading users as well as strategies to minimize cognitive overload.
Collapse
Affiliation(s)
- Lauren N Carroll
- Department of Biomedical Informatics and Medical Education, University of Washington, 850 Republican St., Box 358047, Seattle, WA 98109, United States.
| | - Alan P Au
- Department of Biomedical Informatics and Medical Education, University of Washington, 850 Republican St., Box 358047, Seattle, WA 98109, United States.
| | - Landon Todd Detwiler
- Department of Biological Structure, University of Washington, 1959 NE Pacific St., Box 357420, United States.
| | - Tsung-Chieh Fu
- Department of Epidemiology, University of Washington, 1959 NE Pacific St., Box 357236, Seattle, WA 98195, United States.
| | - Ian S Painter
- Department of Health Services, University of Washington, 1959 NE Pacific St., Box 359442, Seattle, WA 98195, United States.
| | - Neil F Abernethy
- Department of Biomedical Informatics and Medical Education, University of Washington, 850 Republican St., Box 358047, Seattle, WA 98109, United States; Department of Health Services, University of Washington, 1959 NE Pacific St., Box 359442, Seattle, WA 98195, United States.
| |
Collapse
|
7
|
Lyseen AK, Nøhr C, Sørensen EM, Gudes O, Geraghty EM, Shaw NT, Bivona-Tellez C. A Review and Framework for Categorizing Current Research and Development in Health Related Geographical Information Systems (GIS) Studies. Yearb Med Inform 2014; 9:110-24. [PMID: 25123730 DOI: 10.15265/iy-2014-0008] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022] Open
Abstract
OBJECTIVES The application of GIS in health science has increased over the last decade and new innovative application areas have emerged. This study reviews the literature and builds a framework to provide a conceptual overview of the domain, and to promote strategic planning for further research of GIS in health. METHOD The framework is based on literature from the library databases Scopus and Web of Science. The articles were identified based on keywords and initially selected for further study based on titles and abstracts. A grounded theory-inspired method was applied to categorize the selected articles in main focus areas. Subsequent frequency analysis was performed on the identified articles in areas of infectious and non-infectious diseases and continent of origin. RESULTS A total of 865 articles were included. Four conceptual domains within GIS in health sciences comprise the framework: spatial analysis of disease, spatial analysis of health service planning, public health, health technologies and tools. Frequency analysis by disease status and location show that malaria and schistosomiasis are the most commonly analyzed infectious diseases where cancer and asthma are the most frequently analyzed non-infectious diseases. Across categories, articles from North America predominate, and in the category of spatial analysis of diseases an equal number of studies concern Asia. CONCLUSION Spatial analysis of diseases and health service planning are well-established research areas. The development of future technologies and new application areas for GIS and data-gathering technologies such as GPS, smartphones, remote sensing etc. will be nudging the research in GIS and health.
Collapse
Affiliation(s)
- A K Lyseen
- Anders Knørr Lyseen, Department of Development and Planning, Aalborg University, Aalborg, Denmark, E-mail:
| | | | | | | | | | | | | | | |
Collapse
|
8
|
Zandbergen PA. Ensuring Confidentiality of Geocoded Health Data: Assessing Geographic Masking Strategies for Individual-Level Data. Adv Med 2014; 2014:567049. [PMID: 26556417 DOI: 10.1155/2014/567049] [Citation(s) in RCA: 43] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/12/2013] [Revised: 10/25/2013] [Accepted: 10/27/2013] [Indexed: 11/18/2022] Open
Abstract
Public health datasets increasingly use geographic identifiers such as an individual's address. Geocoding these addresses often provides new insights since it becomes possible to examine spatial patterns and associations. Address information is typically considered confidential and is therefore not released or shared with others. Publishing maps with the locations of individuals, however, may also breach confidentiality since addresses and associated identities can be discovered through reverse geocoding. One commonly used technique to protect confidentiality when releasing individual-level geocoded data is geographic masking. This typically consists of applying a certain amount of random perturbation in a systematic manner to reduce the risk of reidentification. A number of geographic masking techniques have been developed as well as methods to quantity the risk of reidentification associated with a particular masking method. This paper presents a review of the current state-of-the-art in geographic masking, summarizing the various methods and their strengths and weaknesses. Despite recent progress, no universally accepted or endorsed geographic masking technique has emerged. Researchers on the other hand are publishing maps using geographic masking of confidential locations. Any researcher publishing such maps is advised to become familiar with the different masking techniques available and their associated reidentification risks.
Collapse
|
9
|
Arsenault J, Michel P, Berke O, Ravel A, Gosselin P. How to choose geographical units in ecological studies: proposal and application to campylobacteriosis. Spat Spatiotemporal Epidemiol 2013; 7:11-24. [PMID: 24238078 DOI: 10.1016/j.sste.2013.04.004] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/11/2010] [Revised: 02/20/2013] [Accepted: 04/17/2013] [Indexed: 11/19/2022]
Abstract
In spatial epidemiology, the choice of an appropriate geographical unit of analysis is a key decision that will influence most aspects of the study. In this study, we proposed and applied a set of measurable criteria applicable for orienting the choice of geographical unit. Nine criteria were selected, covering many aspects such as biological relevance, communicability of results, ease of data access, distribution of exposure variables, cases and population, and shape of unit. These criteria were then applied to compare various geographical units derived from administrative, health services, and natural frameworks that could be used for the study of the spatial distribution of campylobacteriosis in the province of Quebec, Canada. In this study, municipality was the geographical unit that performed the best according to our assessment and given the specific objectives and time period of the study. Future research areas for optimizing the choice of geographical unit are discussed.
Collapse
Affiliation(s)
- Julie Arsenault
- Faculté de médecine vétérinaire, Université de Montréal, 3200 rue Sicotte, Saint-Hyacinthe, Québec, Canada J2S 7C6; Groupe de recherche en épidémiologie des zoonoses et santé publique, Université de Montréal, 3200 Sicotte, Saint-Hyacinthe, Québec, Canada J2S 7C6.
| | | | | | | | | |
Collapse
|
10
|
Abstract
The space-time permutation scan statistic (STPSS) is designed to identify hot (and cool) spots of space-time interaction within patterns of spatio-temporal events. While the method has been adopted widely in practice, there has been little consideration of the effect inaccurate and/or incomplete input data may have on its results. Given the pervasiveness of inaccuracy, uncertainty and incompleteness within spatio-temporal datasets and the popularity of the method, this issue warrants further investigation. Here, a series of simulation experiments using both synthetic and real-world data are carried out to better understand how deficiencies in the spatial and temporal accuracy as well as the completeness of the input data may affect results of the STPSS. The findings, while specific to the parameters employed here, reveal a surprising robustness of the method's results in the face of these deficiencies. As expected, the experiments illustrate that greater degradation of input data quality leads to greater variability in the results. Additionally, they show that weaker signals of space-time interaction are those most affected by the introduced deficiencies. However, in stark contrast to previous investigations into the impact of these input data problems on global tests of space-time interaction, this local metric is revealed to be only minimally affected by the degree of inaccuracy and incompleteness introduced in these experiments.
Collapse
Affiliation(s)
- Nicholas Malizia
- GeoDa Center for Geospatial Analysis and Computation, School of Geographical Sciences and Urban Planning, Arizona State University, Tempe, Arizona, USA.
| |
Collapse
|
11
|
McLafferty S, Freeman VL, Barrett RE, Luo L, Shockley A. Spatial error in geocoding physician location data from the AMA Physician Masterfile: implications for spatial accessibility analysis. Spat Spatiotemporal Epidemiol 2012; 3:31-8. [PMID: 22469489 DOI: 10.1016/j.sste.2012.02.004] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The accuracy of geocoding hinges on the quality of address information that serves as input to the geocoding process; however errors associated with poor address quality are rarely studied. This paper examines spatial errors that arise due to incorrect address information with respect to physician location data in the United States. Studies of spatial accessibility to physicians in the U.S. typically rely on data from the American Medical Association's Physician Masterfile. These data are problematic because a substantial proportion of physicians only report a mailing address, which is often the physician's home (residential) location, rather than the address for the location where health care is provided. The incorrect geocoding of physicians' practice locations based on inappropriate address information results in a form of geocoding error that has not been widely analyzed. Using data for the Chicago metropolitan region, we analyze the extent and implications of geocoding error for measurement of spatial accessibility to primary care physicians. We geocode the locations of primary care physicians based on mailing addresses and office addresses. The spatial mismatch between the two is computed at the county, zip code and point location scales. Although mailing and office address locations are quite close for many physicians, they are far apart (>20 km) for a substantial minority. Kernel density estimation is used to characterize the spatial distribution of physicians based on office and mailing addresses and to identify areas of high spatial mismatch between the two. Errors are socially and geographically uneven, resulting in overestimation of physician supply in some high-income suburban communities, and underestimation in certain central city locations where health facilities are concentrated. The resulting errors affect local measures of spatial accessibility to primary care, biasing statistical analyses of the associations between spatial access to care and health outcomes.
Collapse
Affiliation(s)
- Sara McLafferty
- Department of Geography, University of Illinois at Urbana-Champaign, USA.
| | | | | | | | | |
Collapse
|
12
|
Ortega-García JA, López-Hernández FA, Sobrino-Najul E, Febo I, Fuster-Soler JL. [Environment and paediatric cancer in the Region of Murcia (Spain): integrating clinical and environmental history in a geographic information system]. An Pediatr (Barc) 2011; 74:255-60. [PMID: 21315667 DOI: 10.1016/j.anpedi.2010.11.005] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2010] [Revised: 11/07/2010] [Accepted: 11/09/2010] [Indexed: 11/12/2022] Open
Abstract
INTRODUCTION Environment and Paediatric Cancer (PC) in the Region of Murcia (RM) is an on-going research project that has the following aims: to collect a careful paediatric environmental history (PEH) and to use geographical information systems (GIS) to map the incidence and analyze the geographic distribution of the PC incidence in the RM. The objectives are to present the methodology used for the collection and processing of data and disseminate initial results on the spatial and temporal incidence of PC in the RM (Spain). MATERIAL AND METHODS A descriptive and georeference study of all PC cases under 15 years, diagnosed from 1 January 1998 to December 31, 2009. Three postal addresses were assigned to each case, residence during pregnancy, postnatal, and at the time of diagnosis. Other variables such as sex, date of birth, date of diagnosis, and histopathology classification were collected. RESULTS No increase was observed in the trend of incidence of PC. The crude annual incidence rate was 14.3 cases per 100,000 children under 15 years. The standardised incidence ratio was higher in the north-west of the RM. Before diagnosis, 30% of cases had a different postal address than during the pregnancy. CONCLUSIONS Integrating the spatial and temporal information through the PEH in a GIS should allow the identification and study of space-time clusters through an environmental monitoring system in order to know the importance of associated risk factors.
Collapse
Affiliation(s)
- J A Ortega-García
- Unidad de Salud Medioambiental Pediátrica, Servicio de Pediatría, Unidad de Investigación Traslacional en Cáncer, Hospital Universitario Virgen de la Arrixaca, Murcia, España.
| | | | | | | | | |
Collapse
|