1
|
Short-term Safety Performance Functions by Random Parameters Negative Binomial-Lindley model for Part-time Shoulder Use. ACCIDENT; ANALYSIS AND PREVENTION 2024; 199:107498. [PMID: 38359671 DOI: 10.1016/j.aap.2024.107498] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 01/28/2024] [Accepted: 02/04/2024] [Indexed: 02/17/2024]
Abstract
Part-time Shoulder Use (PTSU) is a traffic management and operation strategy that allows the use of the left or right shoulder as a travel lane, typically during the peak hours of the day. Though PTSU is an effective strategy for increasing roadway capacity in congested traffic conditions, there is very limited quantitative information about PTSU design elements and operational strategy in the existing literature, which could impact the occurrence of crashes on freeways. This study contributes to the safety literature by analyzing various potential crash contributing factors related to PTSU operation and design elements through the development of short-term Safety Performance Functions (SPFs). A comparison of the estimated models demonstrated that by utilizing the mixed distribution and allowing the posterior parameter estimates of explanatory variables to vary from one observation to another, the Random Parameters Negative Binomial-Lindley (RPNB-L) model outperformed the traditional NB and fixed coefficient NB-L models. The results of the proposed RPNB-L model indicated that the PTSU implemented sections experienced a lower number of traffic crashes compared to the non-PTSU freeway sections. Among the attributes related to PTSU operation and design elements, the usage of the leftmost shoulder lane as PTSU, the presence of emergency rest areas for damaged vehicles, and adequate shoulder width would significantly reduce crash frequency for the PTSU implemented freeways. Moreover, investigation of the identified hotspots revealed that the transition areas (start/end locations of PTSU) are the most critical sections. The findings from this research could assist transportation agencies to take appropriate countermeasures for preventing and reducing crash occurrences on PTSU implemented freeways.
Collapse
|
2
|
How effective is reducing traffic speed for safer work zones? Methodology and a case study in Pennsylvania. ACCIDENT; ANALYSIS AND PREVENTION 2023; 183:106966. [PMID: 36696743 DOI: 10.1016/j.aap.2023.106966] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/10/2022] [Revised: 11/21/2022] [Accepted: 01/06/2023] [Indexed: 06/17/2023]
Abstract
Transportation agencies post and enforce reduced speed limits in work zones to ensure work zone safety, since traffic speed is found to be associated with work zone crash risks. However, prior findings on the relationship between speed and crash rate in work zones are inconsistent. This may be attributed to the methods of statistical associations between traffic speed and crash risks that do not necessarily discover true causal relations. In fact, work zone presence could lead to the reduction of actual traffic speed that influences crash risks, where it may also directly impose effects on crash risks as a result of work zone configurations. The actual traffic speed (not posted speed limit) is also known as a "mediator" where work zones can indirectly impact the crash risks. It is challenging to rigorously separate the causal effect of traffic speed on work zone crash risk from that directly caused by work zones. The underlying causal relation could help to determine what reduced post speed limit (with enforcement) is necessary to ensure work zone safety under the most desired "actual traffic speed". This study proposes to use the sequential g-estimation and the regression discontinuity design to estimate the controlled direct effect of traffic speed on work zone crashes. Two research gaps are identified and filled: inaccurate inferences of the effect of reduced speed limit in work zones as a result of ignoring (1) potential post-treatment bias since traffic speed is a mediator; and (2) potential confounding bias caused by unobservable roadway characteristics. The proposed methodology was applied to 4008 work zones in Pennsylvania from 2015 to 2017, and the results were validated through a series of robustness tests. The results indicate that the direct causal effect of the presence of work zones on crash risk is significantly positive when the traffic speed is relatively low (i.e., lower than 55 mph in this case study), while traffic speed has a positive causal effect on crash occurrences when the actual traffic speed is high (i.e., greater or equal to 55 mph). It suggests that strictly enforcing reduced posted speed limits in work zones is particularly effective when the actual traffic speed is greater than 55 mph. This is particularly true on roadways with high traffic volume (i.e., AADT > 20,000 vehicles per day), long, and daytime work zones (i.e., > 3000 m). On the other hand, the effect of enforcing reduced speed on work zone safety is unclear when the actual speed is already low. In this case, improving work zone configurations and driving behaviors may be more effective in reducing crash risks.
Collapse
|
3
|
Copula-based bivariate count data regression models for simultaneous estimation of crash counts based on severity and number of vehicles. ACCIDENT; ANALYSIS AND PREVENTION 2023; 181:106928. [PMID: 36563417 DOI: 10.1016/j.aap.2022.106928] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Revised: 10/12/2022] [Accepted: 12/13/2022] [Indexed: 06/17/2023]
Abstract
Statistical models of crash frequency typically apply univariate regression models to estimate total crash frequency or crash counts by various categories. However, a possible correlation between the dependent variables or unobserved variables associated with the dependent variables is not considered when univariate models are used to estimate categorized crash counts-such as different severity levels or numbers of vehicles involved. This may lead to inefficient parameter estimates compared to multivariate models that directly consider these correlations. This paper compares the results obtained from univariate negative binomial regression models of property-damage only (PDO) and fatal plus injury (FI) crash frequencies to models using traditional bivariate and copula-based bivariate negative binomial regression models. A similar comparison was made using models for the expected crash frequency of single- (SV) and multi-vehicle (MV) crashes. The models were estimated using two-lane, two-way rural highway segment-level data from an engineering district in Pennsylvania. The results show that all bivariate negative binomial models (with or without copulas) outperformed the corresponding univariate negative binomial models for PDO and FI, as well as SV and MV, crashes. Second, the statistical association of various traffic and roadway/roadside features with PDO and FI, as well as SV and MV crashes, were not the same relative to their corresponding relationships in the univariate models. The bivariate negative binomial model with normal copula outperformed all other models based on the goodness-of-fit statistics. The results suggest that copula-based bivariate negative binomial regression models may be a valuable alternative for univariate models when simultaneously modeling two disaggregate levels of crash counts.
Collapse
|
4
|
A before-after evaluation of protected right-turn signal phasings by applying Empirical Bayes and Full Bayes approaches with heterogenous count data models. ACCIDENT; ANALYSIS AND PREVENTION 2023; 179:106882. [PMID: 36356509 DOI: 10.1016/j.aap.2022.106882] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/09/2022] [Revised: 09/29/2022] [Accepted: 10/24/2022] [Indexed: 06/16/2023]
Abstract
Right-turn crashes (or left-turn crashes for the US or similar countries) represent over 40 % of signalized intersection crashes in Queensland, Australia. Protected right-turn phasings are a widely used countermeasure for right-turn crashes, but the research findings on their effects across different crash types and intersection types are not consistent. Methodologically, the Empirical Bayes and Full Bayes techniques are generally applied for before-after evaluations, but the inclusion of heterogeneous models within these techniques has not been considered much. Addressing these research gaps, the objective of this study is to evaluate the effectiveness of protected right-turn signal phasings at signalized intersections employing heterogeneous count data models with the Empirical Bayes and Full Bayes techniques. In particular, the Empirical Bayes approach based on random parameters Poisson-Gamma models (simulation-based Empirical Bayes), and the Full Bayes approach based on random parameters Poisson-Lognormal intervention models (simulation-based Full Bayes) are applied. A total of 69 Cross intersections (with ten treated sites) and 47 T intersections (with six treated sites) from Southeast Queensland in Australia were included in the analysis to estimate the effects of protected right-turn signal phasings on various crash types. Results show that the change of signal phasing from a permissive right-turn phasing to the protected right-turn phasing at cross and T intersections reduces about 87 % and 91 % of right-turn crashes, respectively. In addition, the effect of protected right-turn phasings on rear-end crashes was not significant. The heterogenous count data models significantly address extra Poisson variation, leading to efficient safety estimates in both simulation-based Empirical Bayes and simulation-based Full Bayes approaches. This study demonstrates the importance of accounting for unobserved heterogeneity for the before-after evaluation of engineering countermeasures.
Collapse
|
5
|
Finite mixture Negative Binomial-Lindley for modeling heterogeneous crash data with many zero observations. ACCIDENT; ANALYSIS AND PREVENTION 2022; 175:106765. [PMID: 35947924 DOI: 10.1016/j.aap.2022.106765] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/19/2021] [Revised: 06/22/2022] [Accepted: 06/25/2022] [Indexed: 06/15/2023]
Abstract
Crash data are often highly dispersed; it may also include a large amount of zero observations or have a long tail. The traditional Negative Binomial (NB) model cannot model these data properly. To overcome this issue, the Negative Binomial-Lindley (NB-L) model has been proposed as an alternative to the NB to analyze data with these characteristics. Research studies have shown that the NB-L model provides a superior performance compared to the NB when data include numerous zero observations or have a long tail. In addition, crash data are often collected from sites with different spatial or temporal characteristics. Therefore, it is not unusual to assume that crash data are drawn from multiple subpopulations. Finite mixture models are powerful tools that can be used to account for underlying subpopulations and capture the population heterogeneity. This research documents the derivations and characteristics of the Finite mixture NB-L model (FMNB-L) to analyze data generated from heterogeneous subpopulations with many zero observations and a long tail. We demonstrated the application of the model to identify subpopulations with a simulation study. We then used the FMNB-L model to estimate statistical models for Texas four-lane freeway crashes. These data have unique characteristics; it is highly dispersed, have many locations with very large number of crashes, as well as significant number of locations with zero crash. We used multiple goodness-of-fit metrics to compare the FMNB-L model with the NB, NB-L, and the finite mixture NB models. The FMNB-L identified two subpopulations in datasets. The results show a significantly better fit by the FMNB-L compared to other analyzed models.
Collapse
|
6
|
Derivation of the Empirical Bayesian method for the Negative Binomial-Lindley generalized linear model with application in traffic safety. ACCIDENT; ANALYSIS AND PREVENTION 2022; 170:106638. [PMID: 35339878 DOI: 10.1016/j.aap.2022.106638] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/27/2021] [Revised: 03/07/2022] [Accepted: 03/12/2022] [Indexed: 06/14/2023]
Abstract
The expected crash frequency is the long-term average crash count for a specific site. It is extensively used to systematically evaluate the crash risk associated with roadway elements. To estimate the expected crashes, the Empirical Bayesian (EB) approach is typically employed. The EB method is a computationally convenient approximation to the Full Bayesian (FB) method, which gained popularity due to its simple interpretation, computational efficiency, and the ability to account for the regression to the mean bias. However, the common EB method used in traffic safety analysis is only applicable when the traditional Negative Binomial (NB) model is used. The NB model, however, is not a suitable choice when data is highly dispersed, skewed, or has a large number of zero observations. The Negative Binomial-Lindley (NB-L) model is a mixture of the NB and Lindley distributions and has shown superior fit compared to the NB model, especially when the dataset is characterized by excess zero observations. Even though several studies have used the NB-L in developing crash prediction models, the application of the NB-L in other safety-related tasks (e.g., hot spot identification) is largely neglected. This study proposed a framework to develop the EB method for the NB-L model and subsequently estimate the expected crash values. A comparison between the EB and FB estimates was performed to validate the approximation framework in general. The results indicated that the proposed EB framework is able to estimate expected crashes with comparable precision to the FB estimate, but with much less computational cost. In addition, a site ranking analysis using the EB estimates was conducted to validate the proposed approximation method in safety studies. However, it should be noted that any other type of safety analysis that requires access to the expected crashes can benefit from the proposed EB method. This study concluded that the proposed EB framework can properly approximate the underlying FB approach and can reasonably be considered as an alternative to the traditional EB formula derived from the NB model. The results of this study can help to extend the application of the advanced predictive models beyond predicting crashes to other safety-related tasks, with no additional computational efforts.
Collapse
|
7
|
GEE-based Bell model for longitudinal count outcomes. COMMUN STAT-THEOR M 2022. [DOI: 10.1080/03610926.2022.2056751] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
|
8
|
Temporal stability of associations between crash characteristics: A multiple correspondence analysis. ACCIDENT; ANALYSIS AND PREVENTION 2022; 168:106590. [PMID: 35151096 DOI: 10.1016/j.aap.2022.106590] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Revised: 01/13/2022] [Accepted: 01/28/2022] [Indexed: 06/14/2023]
Abstract
Understanding the associations between crash characteristics facilitates the development of traffic safety policies for improving traffic safety. This study investigates the temporal stability of associations between crash characteristics at different temporal levels using multiple correspondence analysis (MCA). For each date in 2020, crash data from the previous week, month, season, half year, one year, two years, three years, and four years are collected respectively as eight temporal levels. MCA plots and chi-square distance analysis are used to assess the temporal stability of associations between crash characteristics across dates in 2020 with data from various temporal levels. The key findings of this study demonstrate that associations between crash characteristics at lower temporal levels show notable and potential cyclical variations across dates, while more stable and long-term trend of associations between crash characteristics may be identified as the temporal level increases, especially at the two-year level and higher temporal levels at which temporal stability may be expected. The study contributes to the literature by presenting a challenge for traffic analysts in that both temporally stable and unstable associations between crash characteristics may be observed at any point in time when different temporal levels are considered as study periods. Therefore, it may serve as a foundation for future research and practical works to identify traffic safety issues and optimal policies as well as facilitate the interpretation of statistical modeling in the presence of temporally unstable data.
Collapse
|
9
|
A resampling approach to disaggregate analysis of bus-involved crashes using panel data with excessive zeros. ACCIDENT; ANALYSIS AND PREVENTION 2022; 164:106496. [PMID: 34801838 DOI: 10.1016/j.aap.2021.106496] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/20/2021] [Revised: 10/09/2021] [Accepted: 11/12/2021] [Indexed: 06/13/2023]
Abstract
Public bus constitutes more than 70% of the overall road-based public transport patronage in Hong Kong, and its crash involvement rate has been the highest among all public transport modes. Though previous studies had identified explanatory factors that affect the crash risk of buses, use of considerably imbalanced crash data with excessive zero observations could lead to inaccurate parameter estimation. This study aims to resolve the excess zero problem of disaggregate analysis of bus-involved crashes based on synthetic data using a Synthetic Minority Over-Sampling Technique for panel data (SMOTE-P). Dataset comprising crash, traffic, and road inventory data of 88 road segments in Hong Kong during the period from 2014 to 2017 is used. To assess the data balancing performance, other common data generation approaches such as Random Under-sampling of the Majority Class (RUMC) technique, Cluster-Based Under-Sampling (CBUS), and mixed resampling, are also considered. Random effect Poisson (REP) models based on synthetic data and random effect zero-inflated Poisson (REZIP) model based on original data are estimated. Results indicate that REP model based on synthetic data using SMOTE-P outperforms REZIP model based on original data and REP models based on synthetic data using RUMC, CBUS and mixed approaches, in terms of statistical fit, prediction error, and explanatory factors identified. Results of model estimation based on SMOTE-P suggest that factors including morning peak, evening peak, hourly traffic flow, average lane width, road length, bus stop density, percentage of bus in the traffic stream, and presence of bus priority lane all affect the bus-involved crash frequency. More importantly, this study provides a feasible solution for disaggregate crash analysis with imbalanced panel data.
Collapse
|
10
|
Investigating the fatal pedestrian crash occurrence in urban setup in a developing country using multiple-risk source model. ACCIDENT; ANALYSIS AND PREVENTION 2021; 163:106469. [PMID: 34773787 PMCID: PMC9336202 DOI: 10.1016/j.aap.2021.106469] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/22/2021] [Revised: 08/21/2021] [Accepted: 11/01/2021] [Indexed: 06/13/2023]
Abstract
Pedestrian fatalities and injuries are a major public health burden in developing countries. In the safety literature, pedestrian crashes have been modelled predominately using single equation regression models, assuming a single underlying source of risk factors. In contrast, the fatal pedestrian crash counts at a site may be an outcome of multiple sources of risk factors, such as poor road infrastructure, land use type, traffic exposures, and operational parameters, site-specific socio-demographic characteristics, as well as pedestrians' poor risk perception and dangerous crossing behavior, which may be influenced by poor road infrastructure and lack of information, etc. However, these multiple sources are generally overlooked in traditional single equation crash prediction models. In this background, this study postulates, and demonstrates empirically, that the total fatal pedestrian crash counts at the urban road network level may arise from multiple simultaneous and interdependent sources of risk factors, rather than one. Each of these sources may distinctively contribute to the total observed crash count. Intersection-level crash data obtained from the "Kolkata Police", India, is utilized to demonstrate the present modelling methodology. The three-components mixture model and a joint econometric model are developed to predict fatal pedestrian crashes. The study outcomes indicate that the multiple-source risk models perform significantly better than the single equation regression model in terms of prediction ability and goodness-of-fit measures. Moreover, while the single equation model predicts total fatal crash counts for individual sites, the multiple risk source model predicts crash count proportions contributed by each source of risk factors and predicts crashes by a particular source.
Collapse
|
11
|
A random parameters with heterogeneity in means and Lindley approach to analyze crash data with excessive zeros: A case study of head-on heavy vehicle crashes in Queensland. ACCIDENT; ANALYSIS AND PREVENTION 2021; 160:106308. [PMID: 34311952 DOI: 10.1016/j.aap.2021.106308] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/17/2020] [Revised: 07/12/2021] [Accepted: 07/12/2021] [Indexed: 06/13/2023]
Abstract
This study performed statistical analyses to identify likely crash contributing factors for Head-on Fatal and Serious Injury (FSI) collisions involving heavy vehicles (HVs) on the Queensland state road network. Head-on HV collisions are associated with the largest number of fatalities compared to other crash types in Queensland. However, there is limited relevant literature regarding this type of crashes. Recent studies on road safety research have focused on variants of random parameters models to capture unobserved heterogeneity that may influence the occurrence of crashes. Among such models, random parameters with heterogeneity in means has recently provided better results and has become popular. However, this study illustrates a potential limitation regarding the use of these models without explicitly factoring for excessive zero crash observations. To address this potential limitation, a random parameters with heterogeneity in means and a Lindley distribution is introduced in this study to factor for the unobserved heterogeneity using additional variables as well as site-specific variation from excessive zero crash observations. Results showed that a Poisson model with random parameters and heterogeneity in means using a Lindley distribution outperformed multiple alternative state-of-the-art specifications in terms of fit as well as overall prediction ability. The analyses using the proposed modelling approach revealed factors likely to affect the likelihood of Head-on FSI crashes involving HVs in Queensland including volume, segment length, period of analysis, terrain type being rolling, curve (moderate/sharp/very sharp) longer than 50% of the corresponding segment length, rural single carriageway with high (>=100 kph) and medium (>=50 and <100 kph) speed limits, and urban single carriageway. Unobserved heterogeneity regarding the parameter for road curvature was explained using rolling terrain type as an explanatory variable. This study has explained variation in the means of random parameters for a road attribute using the effect of a geometric variable, in which several stakeholders are primarily interested.
Collapse
|
12
|
Examining the impacts of crash data aggregation on SPF estimation. ACCIDENT; ANALYSIS AND PREVENTION 2021; 160:106313. [PMID: 34365043 DOI: 10.1016/j.aap.2021.106313] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/25/2021] [Revised: 06/24/2021] [Accepted: 07/16/2021] [Indexed: 06/13/2023]
Abstract
The American Association of State Highway and Transportation Officials' Highway Safety Manual (HSM) includes a collection of safety performance functions (SPFs) or statistical models to estimate the expected crash frequency of roadway segments, intersections, and interchanges. These models are applied in several steps of the safety management process, including to screen the road network for opportunities to improve safety and to evaluate the performance of safety countermeasure deployments. The SPFs in the HSM are generally estimated using negative binomial regression modeling. In some instances, they are estimated using annual crash frequency and site-specific (e.g., traffic volume) data, while in other instances they are estimated using aggregate crash frequency and site-specific data. This paper explores the differences that result from estimating SPFs using aggregate versus disaggregate data using the same methods as those used to estimate the SPFs in the HSM. A synthetic dataset was first used to conduct these comparisons - these data were generated in a manner that is consistent with the properties of the negative binomial distribution. Then, an observational dataset from Pennsylvania was used to compare the SPFs from both aggregate and disaggregate data. The results show that SPFs estimated using the panel (disaggregate) data and aggregated data provide similar model coefficients, although some differences may sometimes arise. However, the overdispersion parameter obtained using each dataset can differ significantly. These differences result in systematic biases in calculations of expected crash frequency when Empirical Bayes adjustments are applied, which - as the paper demonstrates - could lead to different outcomes in a network screening exercise. Overall, these results reveal that aggregating crash data might result in biased SPF outputs and lead to inconsistent Empirical Bayes adjustments.
Collapse
|
13
|
Application of different negative binomial parameterizations to develop safety performance functions for non-federal aid system roads. ACCIDENT; ANALYSIS AND PREVENTION 2021; 156:106103. [PMID: 33866155 DOI: 10.1016/j.aap.2021.106103] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/09/2020] [Revised: 03/12/2021] [Accepted: 03/22/2021] [Indexed: 06/12/2023]
Abstract
Safety performance functions (SPFs) are the main building blocks in understanding the relationships between crash risk factors and crash frequencies. Many research efforts have focused on high-volume roadways that typically experience more crashes. A few studies have documented SPFs for non-federal aid system (NFAS) roads including rural minor collectors, rural local roads, and urban local roads. NFAS roads are characterized by unique features such as lower speeds, and shorter segment lengths, and they usually experience fewer crashes given the low exposure of these roads. As a result, there is a clear need to investigate the associated safety issues of NFAS roadways and generate distinct SPFs for them. The main objective of this study is to bridge the gap in the literature and develop SPFs for NFAS roads. This study examined the application of traditional negative binomial and zero-favored negative binomial models (i.e., negative binomial-Lindley). Both groups of models were formulated by different variance and dispersion structures. Using crash, roadway inventory, and traffic volume data from 2014 to 2018 in Virginia, the results showed that the NB-L models perform better than the traditional NB models. Furthermore, an appropriate variance structure along with a reasonably chosen dispersion function can further improve the model performance.
Collapse
|
14
|
A new two-parameter discrete poisson-generalized Lindley distribution with properties and applications to healthcare data sets. Comput Stat 2021. [DOI: 10.1007/s00180-021-01097-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
15
|
|
16
|
Mid-term prediction of at-fault crash driver frequency using fusion deep learning with city-level traffic violation data. ACCIDENT; ANALYSIS AND PREVENTION 2021; 150:105910. [PMID: 33302233 DOI: 10.1016/j.aap.2020.105910] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/04/2020] [Revised: 09/08/2020] [Accepted: 11/25/2020] [Indexed: 06/12/2023]
Abstract
Traffic violations and improper driving are behaviors that primarily contribute to traffic crashes. This study aimed to develop effective approaches for predicting at-fault crash driver frequency using only city-level traffic enforcement predictors. A fusion deep learning approach combining a convolution neural network (CNN) and gated recurrent units (GRU) was developed to compare predictive performance with one econometric approach, two machine learning approaches, and another deep learning approach. The performance comparison was conducted for (1) at-fault crash driver frequency prediction tasks and (2) city-level crash risk prediction tasks. The proposed CNN-GRU achieved remarkable prediction accuracy and outperformed other approaches, while the other approaches also exhibited excellent performances. The results suggest that effective prediction approaches and appropriate traffic safety measures can be developed by considering both crash frequency and crash risk prediction tasks. In addition, the accumulated local effects (ALE) plot was utilized to investigate the contribution of each traffic enforcement activity on traffic safety in a scenario of multicollinearity among predictors. The ALE plot illustrated a complex nonlinear relationship between traffic enforcement predictors and the response variable. These findings can facilitate the development of traffic safety measures and serve as a good foundation for further investigations and utilization of traffic violation data.
Collapse
|
17
|
A four-parameter negative binomial-Lindley distribution for modeling over and underdispersed count data with excess zeros. COMMUN STAT-THEOR M 2020. [DOI: 10.1080/03610926.2020.1749854] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
18
|
Adjusting finite sample bias in traffic safety modeling. ACCIDENT; ANALYSIS AND PREVENTION 2019; 131:112-121. [PMID: 31252329 DOI: 10.1016/j.aap.2019.05.026] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/27/2018] [Revised: 02/22/2019] [Accepted: 05/29/2019] [Indexed: 06/09/2023]
Abstract
Poisson and negative binomial regression models are fundamental statistical analysis tools for traffic safety evaluation. The regression parameter estimation could suffer from the finite sample bias when event frequency is low, which is commonly observed in safety research as crashes are rare events. In this study, we apply a bias-correction procedure to the parameter estimation of Poisson and NB regression models. We provide a general bias-correction formulation and illustrate the finite sample bias through a special scenario with a single binary explanatory variable. Several factors affecting the magnitude of bias are identified, including the number of crashes and the balance of the crash counts within strata of a categorical explanatory variable. Simulations are conducted to examine the properties of the bias-corrected coefficient estimators. The results show that the bias-corrected estimators generally provide less bias and smaller variance. The effect is especially pronounced when the crash count in one stratum is between 5 and 50. We apply the proposed method to a case study of infrastructure safety evaluation. Three scenarios were evaluated, all crashes collected in three years, and two hypothetical situations, where crash information was collected for "half-year" and "quarter-year" periods. The case-study results confirm that the magnitude of bias correction is larger for smaller crash counts. This paper demonstrates the finite sample bias associated with the small number of crashes and suggests bias adjustment can provide more accurate estimation when evaluating the impacts of crash risk factors.
Collapse
|
19
|
Abstract
Background: Fluctuations in emergency medical services (EMS) responses can have a substantial impact on the ability of agencies to meet resource needs within an EMS system. We aimed to identify weather characteristics as potentially predictable factors associated with EMS responses. Methods: We reviewed hourly counts of scene responses documented by 24 EMS agencies in Western Pennsylvania from January 1, 2014 to December 31, 2017 and compared rates of responses to weather characteristics. Responses to counties nonadjacent to the studied weather reporting station and interfacility/scheduled transports were excluded. We identified the mean temperature, meters visibility, dew point, wind speed, total millimeters of precipitation, and presence of rain or snow in 6-hour windows prior to dispatch, in addition to temporal factors of time of day and weekend vs. weekday. Analysis was performed using multivariable linear regression of a negative binomial distribution, reporting incidence rate ratios (IRR) with 95% confidence intervals (CI). Secondary analyses were performed for transports to the hospital and cases involving transports for traumatic complaints and pediatric patients (age <18 years). Results: We included 529,058 responses (54.8% female, mean age 57.2 ± SD 24.7 years). In our multivariable model, responses were associated with (IRR, 95% CI) rain (1.10, 1.08-1.11) snow (1.07, 1.05-1.09), and both rain and snow (1.15, 1.11-1.19). A lower incidence of responses occurred on weekends (0.84, 0.83-0.85) and at night (0.62, 0.61-0.62). Increasing temperature in 5 °C increments was associated with an increase in responses across seasons with an effect that varied between 1.16 (1.15-1.17) in winter to 1.31 (1.28-1.33) in summer. Windy weather was associated with increased responses from light breeze (1.10, 1.09-1.11) to fresh breeze or greater (1.23, 1.16-1.30). Transports occurred in a similar pattern to responses. Trauma transports (n = 64,235) occurred more during weekends (1.04, 1.02-1.06). Pediatric transports (n = 21,880) were not significantly associated with precipitation or season. Conclusion: EMS responses increased with rising temperature and following rain and snow. These findings may assist in planning by EMS agencies and emergency departments to identify periods of greatest resource utilization.
Collapse
|
20
|
Applying a random parameters Negative Binomial Lindley model to examine multi-vehicle crashes along rural mountainous highways in Malaysia. ACCIDENT; ANALYSIS AND PREVENTION 2018; 119:80-90. [PMID: 30007211 DOI: 10.1016/j.aap.2018.07.006] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/05/2018] [Revised: 06/22/2018] [Accepted: 07/02/2018] [Indexed: 06/08/2023]
Abstract
Road safety in rural mountainous areas is a major concern as mountainous highways represent a complex road traffic environment due to complex topology and extreme weather conditions and are associated with more severe crashes compared to crashes along roads in flatter areas. The use of crash modelling to identify crash contributing factors along rural mountainous highways suffers from limitations in data availability, particularly in developing countries like Malaysia, and related challenges due to the presence of excess zero observations. To address these challenges, the objective of this study was to develop a safety performance function for multi-vehicle crashes along rural mountainous highways in Malaysia. To overcome the data limitations, an in-depth field survey, in addition to utilization of secondary data sources, was carried out to collect relevant information including roadway geometric factors, traffic characteristics, real-time weather conditions, cross-sectional elements, roadside features, and spatial characteristics. To address heterogeneity resulting from excess zeros, three specialized modelling techniques for excess zeros including Random Parameters Negative Binomial (RPNB), Random Parameters Negative Binomial - Lindley (RPNB-L) and Random Parameters Negative Binomial - Generalized Exponential (RPNB-GE) were employed. Results showed that the RPNB-L model outperformed the other two models in terms of prediction ability and model fit. It was found that heavy rainfall at the time of crash and the presence of minor junctions along mountainous highways increase the likelihood of multi-vehicle crashes, while the presence of horizontal curves along a steep gradient, the presence of a passing lane and presence of road delineation decrease the likelihood of multi-vehicle crashes. Findings of this study have significant implications for road safety along rural mountainous highways, particularly in the context of developing countries.
Collapse
|
21
|
A methodology to design heuristics for model selection based on the characteristics of data: Application to investigate when the Negative Binomial Lindley (NB-L) is preferred over the Negative Binomial (NB). ACCIDENT; ANALYSIS AND PREVENTION 2017; 107:186-194. [PMID: 28886410 DOI: 10.1016/j.aap.2017.07.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/23/2017] [Revised: 05/25/2017] [Accepted: 07/04/2017] [Indexed: 06/07/2023]
Abstract
Safety analysts usually use post-modeling methods, such as the Goodness-of-Fit statistics or the Likelihood Ratio Test, to decide between two or more competitive distributions or models. Such metrics require all competitive distributions to be fitted to the data before any comparisons can be accomplished. Given the continuous growth in introducing new statistical distributions, choosing the best one using such post-modeling methods is not a trivial task, in addition to all theoretical or numerical issues the analyst may face during the analysis. Furthermore, and most importantly, these measures or tests do not provide any intuitions into why a specific distribution (or model) is preferred over another (Goodness-of-Logic). This paper ponders into these issues by proposing a methodology to design heuristics for Model Selection based on the characteristics of data, in terms of descriptive summary statistics, before fitting the models. The proposed methodology employs two analytic tools: (1) Monte-Carlo Simulations and (2) Machine Learning Classifiers, to design easy heuristics to predict the label of the 'most-likely-true' distribution for analyzing data. The proposed methodology was applied to investigate when the recently introduced Negative Binomial Lindley (NB-L) distribution is preferred over the Negative Binomial (NB) distribution. Heuristics were designed to select the 'most-likely-true' distribution between these two distributions, given a set of prescribed summary statistics of data. The proposed heuristics were successfully compared against classical tests for several real or observed datasets. Not only they are easy to use and do not need any post-modeling inputs, but also, using these heuristics, the analyst can attain useful information about why the NB-L is preferred over the NB - or vice versa- when modeling data.
Collapse
|
22
|
|
23
|
|
24
|
Analyzing road design risk factors for run-off-road crashes in The Netherlands with crash prediction models. JOURNAL OF SAFETY RESEARCH 2014; 49:121-127. [PMID: 24913476 DOI: 10.1016/j.jsr.2014.03.003] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/29/2013] [Accepted: 03/05/2014] [Indexed: 06/03/2023]
Abstract
PROBLEM About 50% of all road traffic fatalities and 30% of all traffic injuries in the Netherlands take place on rural roads with a speed limit of 80 km/h. About 50% of these crashes are run-off-road (ROR) crashes. To reduce the number of crashes on this road type, attention should be put on improving the safety of the infrastructure of this road type. With the development of a crash prediction model for ROR crashes on rural roads with a speed limit of 80 km/h, this study aims at making a start in providing the necessary new tools for a proactive road safety policy to road administrators in the Netherlands. METHOD The paper presents a basic framework of the model development, comprising a problem description, the data used, and the method for developing the model. The model is developed with the utilization of generalized linear modeling in SAS, using the Negative Binomial probability distribution. A stepwise approach is used by adding one variable at a time, which forms the basis for striving for a parsimonious model and the evaluation of the model. The likelihood ratio test and the Akaike information criterion are used to assess the model fit, and parameter estimations are compared with literature findings to check for consistency. RESULTS The results comprise two important outcomes. One is a crash prediction model (CPM) to estimate the relative safety of rural roads with a speed limit of 80 km/h in a network. The other is a small set of estimated effects of traffic volume and road characteristics on ROR crash frequencies. PRACTICAL APPLICATIONS The results may lead to adjustments of the road design guidelines in the Netherlands and to further research on the quantification of risk factors with crash prediction models.
Collapse
|
25
|
Methodology for fitting and updating predictive accident models with trend. ACCIDENT; ANALYSIS AND PREVENTION 2013; 56:82-94. [PMID: 23612560 DOI: 10.1016/j.aap.2013.03.009] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/04/2012] [Revised: 01/08/2013] [Accepted: 03/07/2013] [Indexed: 06/02/2023]
Abstract
Reliable predictive accident models (PAMs) (also referred to as Safety Performance Functions (SPFs)) have a variety of important uses in traffic safety research and practice. They are used to help identify sites in need of remedial treatment, in the design of transport schemes to assess safety implications, and to estimate the effectiveness of remedial treatments. The PAMs currently in use in the UK are now quite old; the data used in their development was gathered up to 30 years ago. Many changes have occurred over that period in road and vehicle design, in road safety campaigns and legislation, and the national accident rate has fallen substantially. It seems unlikely that these ageing models can be relied upon to provide accurate and reliable predictions of accident frequencies on the roads today. This paper addresses a number of methodological issues that arise in seeking practical and efficient ways to update PAMs, whether by re-calibration or by re-fitting. Models for accidents on rural single carriageway roads have been chosen to illustrate these issues, including the choice of distributional assumption for overdispersion, the choice of goodness of fit measures, questions of independence between observations in different years, and between links on the same scheme, the estimation of trends in the models, the uncertainty of predictions, as well as considerations about the most efficient and convenient ways to fit the required models.
Collapse
|
26
|
Application of finite mixture of negative binomial regression models with varying weight parameters for vehicle crash data analysis. ACCIDENT; ANALYSIS AND PREVENTION 2013; 50:1042-1051. [PMID: 23022076 DOI: 10.1016/j.aap.2012.08.004] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/17/2012] [Revised: 07/23/2012] [Accepted: 08/05/2012] [Indexed: 06/01/2023]
Abstract
Recently, a finite mixture of negative binomial (NB) regression models has been proposed to address the unobserved heterogeneity problem in vehicle crash data. This approach can provide useful information about features of the population under study. For a standard finite mixture of regression models, previous studies have used a fixed weight parameter that is applied to the entire dataset. However, various studies suggest modeling the weight parameter as a function of the explanatory variables in the data. The objective of this study is to investigate the differences on the modeling and fitting results between the two-component finite mixture of NB regression models with fixed weight parameters (FMNB-2) and the two-component finite mixture of NB regression models with varying weight parameters (GFMNB-2), and compare the group classification from both models. To accomplish the objective of this study, the FMNB-2 and GFMNB-2 models are applied to two crash datasets. The important findings can be summarized as follows: first, the GFMNB-2 models can provide more reasonable classification results, as well as better statistical fitting performance than the FMNB-2 models; second, the GFMNB-2 models can be used to better reveal the source of dispersion observed in the crash data than the FMNB-2 models. Therefore, it is concluded that in many cases the GFMNB-2 models may be a better alternative to the FMNB-2 models for explaining the heterogeneity and the nature of the dispersion in the crash data.
Collapse
|
27
|
The negative binomial-Lindley generalized linear model: characteristics and application using crash data. ACCIDENT; ANALYSIS AND PREVENTION 2012; 45:258-265. [PMID: 22269508 DOI: 10.1016/j.aap.2011.07.012] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/28/2011] [Revised: 07/18/2011] [Accepted: 07/18/2011] [Indexed: 05/31/2023]
Abstract
There has been a considerable amount of work devoted by transportation safety analysts to the development and application of new and innovative models for analyzing crash data. One important characteristic about crash data that has been documented in the literature is related to datasets that contained a large amount of zeros and a long or heavy tail (which creates highly dispersed data). For such datasets, the number of sites where no crash is observed is so large that traditional distributions and regression models, such as the Poisson and Poisson-gamma or negative binomial (NB) models cannot be used efficiently. To overcome this problem, the NB-Lindley (NB-L) distribution has recently been introduced for analyzing count data that are characterized by excess zeros. The objective of this paper is to document the application of a NB generalized linear model with Lindley mixed effects (NB-L GLM) for analyzing traffic crash data. The study objective was accomplished using simulated and observed datasets. The simulated dataset was used to show the general performance of the model. The model was then applied to two datasets based on observed data. One of the dataset was characterized by a large amount of zeros. The NB-L GLM was compared with the NB and zero-inflated models. Overall, the research study shows that the NB-L GLM not only offers superior performance over the NB and zero-inflated models when datasets are characterized by a large number of zeros and a long tail, but also when the crash dataset is highly dispersed.
Collapse
|