1
|
Abstract
BACKGROUND There is increasing interest in the spatial analysis of suicide data to identify high-risk (often public) locations likely to benefit from access restriction measures. The identification of such locations, however, relies on accurately geocoded data. This study aims to examine the extent to which common completeness and positional spatial errors are present in suicide data due to the underlying geocoding process. METHODS Using Australian suicide mortality data from the National Coronial Information System for the period of 2008-2017, we compared the custodian automated geocoding process to an alternate multiphase process. Descriptive and kernel density cluster analyses were conducted to ascertain data completeness (address matching rates) and positional accuracy (distance revised) differences between the two datasets. RESULTS The alternate geocoding process initially improved address matching from 67.8% in the custodian dataset to 78.4%. Additional manual identification of nonaddress features (such as cliffs or bridges) improved overall match rates to 94.6%. Nearly half (49.2%) of nonresidential suicide locations were revised more than 1,000 m from data custodian coordinates. Spatial misattribution rates were greatest at the smallest levels of geography. Kernel density maps showed clear misidentification of hotspots relying solely on autogeocoded data. CONCLUSION Suicide incidents that occur at nonresidential addresses are being erroneously geocoded to centralized fall-back locations in autogeocoding processes, which can lead to misidentification of suicide clusters. Our findings provide insights toward defining the nature of the problem and refining geocoding processes, so that suicide data can be used reliably for the detection of suicide hotspots. See video abstract at, http://links.lww.com/EDE/B862.
Collapse
|
2
|
Geocoding cryptosporidiosis cases in Ireland (2008-2017)-development of a reliable, reproducible, multiphase geocoding methodology. Ir J Med Sci 2021; 190:1497-1507. [PMID: 33464478 PMCID: PMC7813664 DOI: 10.1007/s11845-020-02468-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2020] [Accepted: 12/03/2020] [Indexed: 02/07/2023]
Abstract
Background Geocoding (the process of converting a text address into spatial data) quality may affect geospatial epidemiological study findings. No national standards for best geocoding practice exist in Ireland. Irish postcodes (Eircodes) are not routinely recorded for infectious disease notifications and > 35% of dwellings have non-unique addresses. This may result in incomplete geocoding and introduce systematic errors into studies. Aims This study aimed to develop a reliable and reproducible methodology to geocode cryptosporidiosis notifications to fine-resolution spatial units (Census 2016 Small Areas), to enhance data validity and completeness, thus improving geospatial epidemiological studies. Methods A protocol was devised to utilise geocoding tools developed by the Health Service Executive’s Health Intelligence Unit. Geocoding employed finite-string automated and manual matching, undertaken sequentially in three additive phases. The protocol was applied to a cryptosporidiosis notification dataset (2008–2017) from Ireland’s Computerised Infectious Disease Reporting System. Outputs were validated against devised criteria. Results Overall, 92.1% (4266/4633) of cases were successfully geocoded to one Small Area, and 95.5% (n = 4425) to larger spatial units. The proportion of records geocoded increased by 14% using the multiphase approach, with 5% of records re-assigned to a different spatial unit. Conclusions The developed multiphase protocol improved the completeness and validity of geocoding, thus increasing the power of subsequent studies. The authors recommend capturing Eircodes ideally using application programming interface for infectious disease or other health-related datasets, for more efficient and reliable geocoding. Where Eircodes are not recorded/available, for best geocoding practice, we recommend this (or a similar) quality driven protocol.
Collapse
|
3
|
GIScience and cancer: State of the art and trends for cancer surveillance and epidemiology. Cancer 2019; 125:2544-2560. [PMID: 31145834 PMCID: PMC6625915 DOI: 10.1002/cncr.32052] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2018] [Revised: 06/05/2018] [Accepted: 06/25/2018] [Indexed: 12/18/2022]
Abstract
Maps are well recognized as an effective means of presenting and communicating health data, such as cancer incidence and mortality rates. These data can be linked to geographic features like counties or census tracts and their associated attributes for mapping and analysis. Such visualization and analysis provide insights regarding the geographic distribution of cancer and can be important for advancing effective cancer prevention and control programs. Applying a spatial approach allows users to identify location-based patterns and trends related to risk factors, health outcomes, and population health. Geographic information science (GIScience) is the discipline that applies Geographic Information Systems (GIS) and other spatial concepts and methods in research. This review explores the current state and evolution of GIScience in cancer research by addressing fundamental topics and issues regarding spatial data and analysis that need to be considered. GIScience, along with its health-specific application in the spatial epidemiology of cancer, incorporates multiple geographic perspectives pertaining to the individual, the health care infrastructure, and the environment. Challenges addressing these perspectives and the synergies among them can be explored through GIScience methods and associated technologies as integral parts of epidemiologic research, analysis efforts, and solutions. The authors suggest GIScience is a powerful tool for cancer research, bringing additional context to cancer data analysis and potentially informing decision-making and policy, ultimately aimed at reducing the burden of cancer.
Collapse
|
4
|
Smartphone-assisted spatial data collection improves geographic information quality: pilot study using a birth records dataset. GEOSPATIAL HEALTH 2016; 11:482. [PMID: 27903063 PMCID: PMC5800510 DOI: 10.4081/gh.2016.482] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/12/2016] [Revised: 08/15/2016] [Accepted: 09/01/2016] [Indexed: 05/21/2023]
Abstract
It is well known that the conventional, automated geocoding method based on self-reported residential addresses has many issues. We developed a smartphone-assisted aerial image-based method, which uses the Google Maps application programming interface as a spatial data collection tool during the birth registration process. In this pilot study, we have tested whether the smartphone-assisted method provides more accurate geographic information than the automated geocoding method in the scenario when both methods can get the address geocodes. We randomly selected 100 well-geocoded addresses among women who gave birth in Alachua county, Florida in 2012. We compared geocodes generated from three geocoding methods: i) the smartphone-assisted aerial image-based method; ii) the conventional, automated geocoding method; and iii) the global positioning system (GPS). We used the GPS data as the reference method. The automated geocoding method yielded positional errors larger than 100 m among 29.3% of addresses, while all addresses geocoded by the smartphoneassisted method had errors less than 100 m. The positional errors of the automated geocoding method were greater for apartment/condominiums compared with other dwellings and also for rural addresses compared with urban ones. We conclude that the smartphone-assisted method is a promising method for perspective spatial data collection by improving positional accuracy.
Collapse
|
5
|
Using an Optimized Chinese Address Matching Method to Develop a Geocoding Service: A Case Study of Shenzhen, China. ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION 2016. [DOI: 10.3390/ijgi5050065] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
|
6
|
Measuring health-relevant businesses over 21 years: refining the National Establishment Time-Series (NETS), a dynamic longitudinal data set. BMC Res Notes 2015; 8:507. [PMID: 26420471 PMCID: PMC4588464 DOI: 10.1186/s13104-015-1482-4] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2014] [Accepted: 09/21/2015] [Indexed: 11/30/2022] Open
Abstract
Background The densities of food retailers, alcohol outlets, physical activity facilities, and medical facilities have been associated with diet, physical activity, and management of medical conditions. Most of the research, however, has relied on cross-sectional studies. In this paper, we assess methodological issues raised by a data source that is increasingly used to characterize change in the local business environment: the National Establishment Time Series (NETS) dataset. Discussion Longitudinal data, such as NETS, offer opportunities to assess how differential access to resources impacts population health, to consider correlations among multiple environmental influences across the life course, and to gain a better understanding of their interactions and cumulative health effects. Longitudinal data also introduce new data management, geoprocessing, and business categorization challenges. Examining geocoding accuracy and categorization over 21 years of data in 23 counties surrounding New York City (NY, USA), we find that health-related business environments change considerably over time. We note that re-geocoding data may improve spatial precision, particularly in early years. Our intent with this paper is to make future public health applications of NETS data more efficient, since the size and complexity of the data can be difficult to exploit fully within its 2-year data-licensing period. Further, standardized approaches to NETS and other “big data” will facilitate the veracity and comparability of results across studies. Electronic supplementary material The online version of this article (doi:10.1186/s13104-015-1482-4) contains supplementary material, which is available to authorized users.
Collapse
|
7
|
Abstract
BACKGROUND Geocoding, the process of converting textual information describing a location into one or more digital geographic representations, is a routine task performed at large organizations and government agencies across the globe. In a health context, this task is often a fundamental first step performed prior to all operations that take place in a spatially-based health study. As such, the quality of the geocoding system used within these agencies is of paramount concern to the agency (the producer) and researchers or policy-makers who wish to use these data (consumers). However, geocoding systems are continually evolving with new products coming on the market continuously. Agencies must develop and use criteria across a number axes when faced with decisions about building, buying, or maintaining any particular geocoding systems. To date, published criteria have focused on one or more aspects of geocode quality without taking a holistic view of a geocoding system's role within a large organization. The primary purpose of this study is to develop and test an evaluation framework to assist a large organization in determining which geocoding systems will meet its operational needs. METHODS A geocoding platform evaluation framework is derived through an examination of prior literature on geocoding accuracy. The framework developed extends commonly used geocoding metrics to take into account the specific concerns of large organizations for which geocoding is a fundamental operational capability tightly-knit into its core mission of processing health data records. A case study is performed to evaluate the strengths and weaknesses of five geocoding platforms currently available in the Australian geospatial marketplace. RESULTS The evaluation framework developed in this research is proven successful in differentiating between key capabilities of geocoding systems that are important in the context of a large organization with significant investments in geocoding resources. Results from the proposed methodology highlight important differences across all axes of geocoding system comparisons including spatial data output accuracy, reference data coverage, system flexibility, the potential for tight integration, and the need for specialized staff and/or development time and funding. Such results can empower decisions-makers within large organizations as they make decisions and investments in geocoding systems.
Collapse
|
8
|
Effects of georeferencing effort on mapping monkeypox case distributions and transmission risk. Int J Health Geogr 2012; 11:23. [PMID: 22738820 PMCID: PMC3724478 DOI: 10.1186/1476-072x-11-23] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2012] [Accepted: 06/14/2012] [Indexed: 11/17/2022] Open
Abstract
BACKGROUND Maps of disease occurrences and GIS-based models of disease transmission risk are increasingly common, and both rely on georeferenced diseases data. Automated methods for georeferencing disease data have been widely studied for developed countries with rich sources of geographic referenced data. However, the transferability of these methods to countries without comparable geographic reference data, particularly when working with historical disease data, has not been as widely studied. Historically, precise geographic information about where individual cases occur has been collected and stored verbally, identifying specific locations using place names. Georeferencing historic data is challenging however, because it is difficult to find appropriate geographic reference data to match the place names to. Here, we assess the degree of care and research invested in converting textual descriptions of disease occurrence locations to numerical grid coordinates (latitude and longitude). Specifically, we develop three datasets from the same, original monkeypox disease occurrence data, with varying levels of care and effort: the first based on an automated web-service, the second improving on the first by reference to additional maps and digital gazetteers, and the third improving still more based on extensive consultation of legacy surveillance records that provided considerable additional information about each case. To illustrate the implications of these seemingly subtle improvements in data quality, we develop ecological niche models and predictive maps of monkeypox transmission risk based on each of the three occurrence data sets. RESULTS We found macrogeographic variations in ecological niche models depending on the type of georeferencing method used. Less-careful georeferencing identified much smaller areas as having potential for monkeypox transmission in the Sahel region, as well as around the rim of the Congo Basin. These results have implications for mapping efforts, as each higher level of georeferencing precision required considerably greater time investment. CONCLUSIONS The importance of careful georeferencing cannot be overlooked, despite it being a time- and labor-intensive process. Investment in archival storage of primary disease-occurrence data is merited, and improved digital gazetteers are needed to support public health mapping activities, particularly in developing countries, where maps and geographic information may be sparse.
Collapse
|
9
|
A multi-stage approach to maximizing geocoding success in a large population-based cohort study through automated and interactive processes. GEOSPATIAL HEALTH 2012; 6:273-284. [PMID: 22639129 PMCID: PMC3683076 DOI: 10.4081/gh.2012.145] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
To enable spatial analyses within a large, prospective cohort study of nearly 86,000 adults enrolled in a 12-state area in the southeastern United States of America from 2002-2009, a multi-stage geocoding protocol was developed to efficiently maximize the proportion of participants assigned an address level geographic coordinate. Addresses were parsed, cleaned and standardized before applying a combination of automated and interactive geocoding tools. Our full protocol increased the non-Post Office (PO) Box match rate from 74.5% to 97.6%. Overall, we geocoded 99.96% of participant addresses, with only 5.2% at the ZIP code centroid level (2.8% PO Box and 2.3% non-PO Box addresses). One key to reducing the need for interactive geocoding was the use of multiple base maps. Still, addresses in areas with population density <44 persons/km2 were much more likely to require resource-intensive interactive geocoding than those in areas with >920 persons/km2 (odds ratio (OR) = 5.24; 95% confidence interval (CI) = 4.23, 6.49), as were addresses collected from participants during in-person interviews compared with mailed questionnaires (OR = 1.83; 95% CI = 1.59, 2.11). This study demonstrates that population density and address ascertainment method can influence automated geocoding results and that high success in address level geocoding is achievable for large-scale studies covering wide geographical areas.
Collapse
|
10
|
Moving Neighborhoods and Health Research Forward: Using Geographic Methods to Examine the Role of Spatial Scale in Neighborhood Effects on Health. ACTA ACUST UNITED AC 2012; 102:986-995. [PMID: 23264694 DOI: 10.1080/00045608.2012.659621] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Abstract
A rich history of research documents the effects of neighborhood-level socioeconomic status (SES) conditions on health outcomes. Recent criticism of the neighborhoods and health literature, however, has stressed several conceptual and methodological challenges not adequately addressed in previous research. Critics suggest that early work on neighborhoods and health gave little thought to the spatial scale at which SES factors influence a specific health outcome. This article discusses the concept of neighborhoods and health, reviews recent criticisms of existing work, and provides a case study that exemplifies how geographic methods can address one such criticism. Using data on birth defects in North Carolina, the case study examines the relation of SES to orofacial clefts (cleft lip and cleft palate) at different spatial scales. The Brown-Forsythe test is used to select optimal neighborhood size. Results are evaluated using logistic regression models to examine the relationship between SES measures and orofacial clefts, controlling for individual-level risk factors. Results indicate modest associations between neighborhood-level measures of poverty and cleft palate but no associations with cleft lip with or without cleft palate.
Collapse
|
11
|
Arsenic in North Carolina: public health implications. ENVIRONMENT INTERNATIONAL 2012. [PMID: 21982028 DOI: 10.1016/j.envint.2011.08.005.arsenic] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
Arsenic is a known human carcinogen and relevant environmental contaminant in drinking water systems. We set out to comprehensively examine statewide arsenic trends and identify areas of public health concern. Specifically, arsenic trends in North Carolina private wells were evaluated over an eleven-year period using the North Carolina Department of Health and Human Services database for private domestic well waters. We geocoded over 63,000 domestic well measurements by applying a novel geocoding algorithm and error validation scheme. Arsenic measurements and geographical coordinates for database entries were mapped using Geographic Information System techniques. Furthermore, we employed a Bayesian Maximum Entropy (BME) geostatistical framework, which accounts for geocoding error to better estimate arsenic values across the state and identify trends for unmonitored locations. Of the approximately 63,000 monitored wells, 7712 showed detectable arsenic concentrations that ranged between 1 and 806μg/L. Additionally, 1436 well samples exceeded the EPA drinking water standard. We reveal counties of concern and demonstrate a historical pattern of elevated arsenic in some counties, particularly those located along the Carolina terrane (Carolina slate belt). We analyzed these data in the context of populations using private well water and identify counties for targeted monitoring, such as Stanly and Union Counties. By spatiotemporally mapping these data, our BME estimate revealed arsenic trends at unmonitored locations within counties and better predicted well concentrations when compared to the classical kriging method. This study reveals relevant information on the location of arsenic-contaminated private domestic wells in North Carolina and indicates potential areas at increased risk for adverse health outcomes.
Collapse
|
12
|
Arsenic in North Carolina: public health implications. ENVIRONMENT INTERNATIONAL 2012; 38:10-6. [PMID: 21982028 PMCID: PMC3539775 DOI: 10.1016/j.envint.2011.08.005] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/20/2011] [Revised: 08/04/2011] [Accepted: 08/07/2011] [Indexed: 05/18/2023]
Abstract
Arsenic is a known human carcinogen and relevant environmental contaminant in drinking water systems. We set out to comprehensively examine statewide arsenic trends and identify areas of public health concern. Specifically, arsenic trends in North Carolina private wells were evaluated over an eleven-year period using the North Carolina Department of Health and Human Services database for private domestic well waters. We geocoded over 63,000 domestic well measurements by applying a novel geocoding algorithm and error validation scheme. Arsenic measurements and geographical coordinates for database entries were mapped using Geographic Information System techniques. Furthermore, we employed a Bayesian Maximum Entropy (BME) geostatistical framework, which accounts for geocoding error to better estimate arsenic values across the state and identify trends for unmonitored locations. Of the approximately 63,000 monitored wells, 7712 showed detectable arsenic concentrations that ranged between 1 and 806μg/L. Additionally, 1436 well samples exceeded the EPA drinking water standard. We reveal counties of concern and demonstrate a historical pattern of elevated arsenic in some counties, particularly those located along the Carolina terrane (Carolina slate belt). We analyzed these data in the context of populations using private well water and identify counties for targeted monitoring, such as Stanly and Union Counties. By spatiotemporally mapping these data, our BME estimate revealed arsenic trends at unmonitored locations within counties and better predicted well concentrations when compared to the classical kriging method. This study reveals relevant information on the location of arsenic-contaminated private domestic wells in North Carolina and indicates potential areas at increased risk for adverse health outcomes.
Collapse
|
13
|
Socioeconomic context and gastroschisis: Exploring associations at various geographic scales. Soc Sci Med 2011; 72:625-33. [DOI: 10.1016/j.socscimed.2010.11.025] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2009] [Revised: 08/16/2010] [Accepted: 11/13/2010] [Indexed: 11/17/2022]
|
14
|
Tracing drinking water to its source: An ecological study of the relationship between textile mills and gastroschisis in North Carolina. Health Place 2010; 16:794-802. [PMID: 20452267 DOI: 10.1016/j.healthplace.2010.04.004] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/08/2009] [Revised: 03/17/2010] [Accepted: 04/02/2010] [Indexed: 11/22/2022]
Abstract
Gastroschisis is a rare birth defect that has increased in prevalence over the past several decades but the etiology of the disease is largely unknown. Using data from the North Carolina Birth Defects Monitoring Program, we estimated multilevel logistic regression models to evaluate the association between drinking water source and upstream textile mills and the risk of a gastroschisis birth. Results indicate that while prenatal exposure to upstream textile mill effluent does not have an impact on the risk of a gastroschisis birth, women relying on public water systems that draw from a surface water source have an elevated risk. These findings suggest the possibility of a contaminant found in higher levels in surface water compared to groundwater.
Collapse
|
15
|
Using imputation to provide location information for nongeocoded addresses. PLoS One 2010; 5:e8998. [PMID: 20161766 PMCID: PMC2818716 DOI: 10.1371/journal.pone.0008998] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2009] [Accepted: 01/07/2010] [Indexed: 01/12/2023] Open
Abstract
Background The importance of geography as a source of variation in health research continues to receive sustained attention in the literature. The inclusion of geographic information in such research often begins by adding data to a map which is predicated by some knowledge of location. A precise level of spatial information is conventionally achieved through geocoding, the geographic information system (GIS) process of translating mailing address information to coordinates on a map. The geocoding process is not without its limitations, though, since there is always a percentage of addresses which cannot be converted successfully (nongeocodable). This raises concerns regarding bias since traditionally the practice has been to exclude nongeocoded data records from analysis. Methodology/Principal Findings In this manuscript we develop and evaluate a set of imputation strategies for dealing with missing spatial information from nongeocoded addresses. The strategies are developed assuming a known zip code with increasing use of collateral information, namely the spatial distribution of the population at risk. Strategies are evaluated using prostate cancer data obtained from the Maryland Cancer Registry. We consider total case enumerations at the Census county, tract, and block group level as the outcome of interest when applying and evaluating the methods. Multiple imputation is used to provide estimated total case counts based on complete data (geocodes plus imputed nongeocodes) with a measure of uncertainty. Results indicate that the imputation strategy based on using available population-based age, gender, and race information performed the best overall at the county, tract, and block group levels. Conclusions/Significance The procedure allows for the potentially biased and likely under reported outcome, case enumerations based on only the geocoded records, to be presented with a statistically adjusted count (imputed count) with a measure of uncertainty that are based on all the case data, the geocodes and imputed nongeocodes. Similar strategies can be applied in other analysis settings.
Collapse
|
16
|
Abstract
The increasing use of geographic information systems (GIS) in epidemiological population studies requires careful attention to the methods employed in accomplishing geocoding and creating a GIS. Studies have provided limited details,hampering the ability to assess validity of spatial data. The purpose of this paper is to describe the multiphase geocoding methods used to retrospectively create a GIS in the Jackson Heart Study (JHS). We used baseline data from 5,302 participants enrolled in the JHS between 2000 and 2004 in a multiphase process to accomplish geocoding2 years after participant enrollment. After initial deletion of ungeocodable addresses(n=52), 96% were geocoded using ArcGIS. An interactive method using data abstraction from participant records, use of additional maps and street reference files,and verification of existence of address, yielded successful geocoding of all but 13 addresses. Overall, nearly 99% (n=5,237) of the JHS cohort was geocoded retrospectively using the multiple strategies for improving and locating geocodable addresses. Geocoding validation procedures revealed highly accurate and reliable geographic data. Using the methods and protocol developed provided a reliable spatial database that can be used for further investigation of spatial epidemiology. Baseline results were used to describe participants by select geographic indicators, including residence in urban or rural areas, as well as to validate the effectiveness of the study's sampling plan. Further, our results indicate that retrospectively developing a reliable GIS for a large, epidemiological study is feasible. This paper describes some of the challenges in retrospectively creating a GIS and provides practical tips that enhanced the success.
Collapse
|
17
|
Evidence of localized clustering of gastroschisis births in North Carolina, 1999–2004. Soc Sci Med 2009; 68:1361-7. [DOI: 10.1016/j.socscimed.2009.01.034] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2008] [Indexed: 11/22/2022]
|
18
|
An effective and efficient approach for manually improving geocoded data. Int J Health Geogr 2008; 7:60. [PMID: 19032791 PMCID: PMC2612650 DOI: 10.1186/1476-072x-7-60] [Citation(s) in RCA: 79] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2008] [Accepted: 11/26/2008] [Indexed: 12/13/2022] Open
Abstract
Background The process of geocoding produces output coordinates of varying degrees of quality. Previous studies have revealed that simply excluding records with low-quality geocodes from analysis can introduce significant bias, but depending on the number and severity of the inaccuracies, their inclusion may also lead to bias. Little quantitative research has been presented on the cost and/or effectiveness of correcting geocodes through manual interactive processes, so the most cost effective methods for improving geocoded data are unclear. The present work investigates the time and effort required to correct geocodes contained in five health-related datasets that represent examples of data commonly used in Health GIS. Results Geocode correction was attempted on five health-related datasets containing a total of 22,317 records. The complete processing of these data took 11.4 weeks (427 hours), averaging 69 seconds of processing time per record. Overall, the geocodes associated with 12,280 (55%) of records were successfully improved, taking 95 seconds of processing time per corrected record on average across all five datasets. Geocode correction improved the overall match rate (the number of successful matches out of the total attempted) from 79.3 to 95%. The spatial shift between the location of original successfully matched geocodes and their corrected improved counterparts averaged 9.9 km per corrected record. After geocode correction the number of city and USPS ZIP code accuracy geocodes were reduced from 10,959 and 1,031 to 6,284 and 200, respectively, while the number of building centroid accuracy geocodes increased from 0 to 2,261. Conclusion The results indicate that manual geocode correction using a web-based interactive approach is a feasible and cost effective method for improving the quality of geocoded data. The level of effort required varies depending on the type of data geocoded. These results can be used to choose between data improvement options (e.g., manual intervention, pseudocoding/geo-imputation, field GPS readings).
Collapse
|
19
|
Informing geospatial toolset design: understanding the process of cancer data exploration and analysis. Health Place 2008; 14:576-607. [PMID: 18060824 PMCID: PMC2408638 DOI: 10.1016/j.healthplace.2007.10.009] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/22/2007] [Revised: 08/29/2007] [Accepted: 10/12/2007] [Indexed: 10/22/2022]
Abstract
There is an increasing need for new methods and tools that support knowledge construction from complex geospatial datasets related to public health. This study is part of a larger effort to develop, implement, and test such methods and tools. To be successful, the design of methods and tools must be grounded in a solid understanding of the work practices within the domain of use; the research reported here focuses on developing that understanding. We adopted a user-centered approach to toolset design where we investigated the work of cancer researchers and used the results of that investigation as inputs into the development of design guidelines for new geovisualization and spatial analysis tools. Specifically, we conducted key informant interviews focused on use, or potential use, of geographic information, methods, and tools and complemented this with a systematic analysis of published, peer-reviewed articles on geospatial cancer research. Results were used to characterize the typical process of analysis, to identify fundamental differences between intensive users of geospatial methods and infrequent users, and to outline key stages in analysis and tasks within the stages that methods and tools must support. Our findings inform design and implementation decisions for visual and analytic tools that support cancer prevention and control research and they provide insight into the processes used by cancer researchers for addressing the challenges of geographic factors in public health research and policy.
Collapse
|
20
|
Using built environment characteristics to predict walking for exercise. Int J Health Geogr 2008; 7:10. [PMID: 18312660 PMCID: PMC2279119 DOI: 10.1186/1476-072x-7-10] [Citation(s) in RCA: 67] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2007] [Accepted: 02/29/2008] [Indexed: 01/08/2023] Open
Abstract
Background Environments conducive to walking may help people avoid sedentary lifestyles and associated diseases. Recent studies developed walkability models combining several built environment characteristics to optimally predict walking. Developing and testing such models with the same data could lead to overestimating one's ability to predict walking in an independent sample of the population. More accurate estimates of model fit can be obtained by splitting a single study population into training and validation sets (holdout approach) or through developing and evaluating models in different populations. We used these two approaches to test whether built environment characteristics near the home predict walking for exercise. Study participants lived in western Washington State and were adult members of a health maintenance organization. The physical activity data used in this study were collected by telephone interview and were selected for their relevance to cardiovascular disease. In order to limit confounding by prior health conditions, the sample was restricted to participants in good self-reported health and without a documented history of cardiovascular disease. Results For 1,608 participants meeting the inclusion criteria, the mean age was 64 years, 90 percent were white, 37 percent had a college degree, and 62 percent of participants reported that they walked for exercise. Single built environment characteristics, such as residential density or connectivity, did not significantly predict walking for exercise. Regression models using multiple built environment characteristics to predict walking were not successful at predicting walking for exercise in an independent population sample. In the validation set, none of the logistic models had a C-statistic confidence interval excluding the null value of 0.5, and none of the linear models explained more than one percent of the variance in time spent walking for exercise. We did not detect significant differences in walking for exercise among census areas or postal codes, which were used as proxies for neighborhoods. Conclusion None of the built environment characteristics significantly predicted walking for exercise, nor did combinations of these characteristics predict walking for exercise when tested using a holdout approach. These results reflect a lack of neighborhood-level variation in walking for exercise for the population studied.
Collapse
|
21
|
Validation of the geographic position of EPER-Spain industries. Int J Health Geogr 2008; 7:1. [PMID: 18190678 PMCID: PMC2254386 DOI: 10.1186/1476-072x-7-1] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2007] [Accepted: 01/11/2008] [Indexed: 11/15/2022] Open
Abstract
Background The European Pollutant Emission Register in Spain (EPER-Spain) is a public inventory of pollutant industries created by decision of the European Union. The location of these industries is geocoded and the first published data correspond to 2001. Publication of these data will allow for quantification of the effect of proximity to one or more such plant on cancer and all-cause mortality observed in nearby towns. However, as errors have been detected in the geocoding of many of the pollutant foci shown in the EPER, it was decided that a validation study should be conducted into the accuracy of these co-ordinates. EPER-Spain geographic co-ordinates were drawn from the European Environment Agency (EEA) server and the Spanish Ministry of the Environment (MOE). The Farm Plot Geographic Information System (Sistema de Información Geográfica de Parcelas Agrícolas) (SIGPAC) enables orthophotos (digitalized aerial images) of any territorial point across Spain to be obtained. Through a search of co-ordinates in the SIGPAC, all the industrial foci (except farms) were located. The quality criteria used to ascertain possible errors in industrial location were high, medium and low quality, where industries were situated at a distance of less than 500 metres, more than 500 metres but less than 1 kilometre, and more than 1 kilometre from their real locations, respectively. Results Insofar as initial registry quality was concerned, 84% of industrial complexes were inaccurately positioned (low quality) according to EEA data versus 60% for Spanish MOE data. The distribution of the distances between the original and corrected co-ordinates for each of the industries on the registry revealed that the median error was 2.55 kilometres for Spain overall (according to EEA data). The Autonomous Regions that displayed most errors in industrial geocoding were Murcia, Canary Islands, Andalusia and Madrid. Correct co-ordinates were successfully allocated to 100% of EPER-Spain industries. Conclusion Knowing the exact location of pollutant foci is vital to obtain reliable and valid conclusions in any study where distance to the focus is a decisive factor, as in the case of the consequences of industrial pollution on the health of neighbouring populations.
Collapse
|