ANALYSIS OF INFLUENCING FACTORS IDENTIFICATION OF CRASH RATES USING TOBIT MODEL WITH ENDOGENOUS VARIABLE

The objective of this study is to identify the influencing factors of crash rates from the perspective of access management techniques in urban areas. The target areas are located in the Las Vegas Metropolitan area, and 19 arterials are selected. In order to address the interdependency between crash rates and travel speeds, and left-censored issue, a tobit model with endogenous variable is presented. The structure of the tobit model addresses the left-censored issue for the segments while the endogeneity issue between crash rates and travel speeds is explained. The results indicate that there is strong interdependency between crash rates and travel speeds. The segment length, driveway density, median opening density, posted speed limit and AADT per lane are statistically significant factors that influence crash rates on the segments; moreover, crash rates are significantly influenced by two-directional median opening density.


INTRODUCTION
The most widely used approach to study the influencing factors of crash occurrence is to analyze the crash frequency on roadway segments during a speci-fied period of time.In recent years, there have been many studies on the crash frequency models including (see Literature [1] for detailed description): Poisson models; negative binomial models; Poisson-lognormal models; zero-inflated count models; Conway-Maxwell-Poisson models; Gamma models; generalized estimating equation models; generalized additive models; random effects models; negative multinomial models; random parameters count models; finite mixture and Markov switching models, and other intelligent algorithms.
In most literature, crash rates (such as the number of crashes per million vehicle miles travelled) have been considered as one standardized measure of roadway safety [2].The way in which the crash rates are used is because crashes per 100-million vehicle miles travelled is a continuous variable instead of the non-negative integer used for discrete crash event over some period, but there might be no-crash-occurred for some roadway segments during the observation period, so that the data will be left-censored at zero.Literature [3] explored the application of the tobit regression on the censoring problem, and the results suggested the potentiality that tobit regression has in analyzing crash rates on interstate highways, but the study ignored the unobserved heterogeneity across observations, which may lead to biased estimation and incorrect inferences [4].In order to deal with the heterogeneity across observations, literature [2] employed a random-parameters tobit regression model in the study of motor vehicle crash rates, and the results showed that the random-parameters tobit model had the potential to analyze the factors determining crash rates, while addressing the unobserved heterogeneity problem across observations.However, the studies of crash rates came up with the endogenous issue between crash rates and travel speeds.Likewise, in the presence of the endogenous problem, the parameters estimates and the inferences could result in biased results in urban areas due to the interdependence between crash rates and travel speeds.As a matter of fact, recent studies conducted by Literature [5] and [6] have demonstrated that the adoption of panel data simultaneous equation models could account for endogeneity issue, and explained the impact factors of the crash rates by introducing the access management techniques, but the segments selected all included crashes, which might be ideal in reality.Therefore, in order to solve endogeneity problem, and the left-censored issue, simultaneous equations model and tobit model, so called simultaneous equation tobit models, are integrated to find out the influencing factors of crash rates.Theoretically, many scholars [7][8][9][10][11] have estimated simultaneous equation tobit model using various approaches; Practically literature [12] and [13] have applied the simultaneous equation tobit model into labour supply and household expenditures respectively, and revealed its potentiality.
Access management techniques play an important role in urban roadway safety on the roadway network.The access provided by urban streets and highways to adjacent lands is managed by controlling the spacing between the access points including signals, driveways, and median openings within the corridor.Two major issues exist in evaluating the impact of access management on safety in the urban area: heterogeneity (uniqueness) and endogeneity (interdependency).Heterogeneity refers to the statistical issues caused by observations that share common unobserved information.In urban streets, the safety and mobility data for the segments selected from the same arterial may have similar patterns that are unique to that arterial.This issue may not be significant if the sampled roadways were from different jurisdictions such as cities or counties, but it becomes noticeable when investigation of the safety and mobility issues related to access management is for a single urban area, for instance the Las Vegas area.If this issue is not addressed appropriately, safety and mobility may not be evaluated accurately statistically; Endogeneity refers to the interdependence between dependent and independent variables in regression models.Generally the measures for safety, such as crash rate, were used as the only dependent variable, while the measures for mobility, such as travel speed, were used as indepen-dent variables by which only the dependence of safety on mobility was addressed.In reality, mobility is also influenced by safety.By including mobility (i.e.travel speed) only as independent variable, researchers failed to address the dependency of mobility on safety.Both measures of roadway performance are interacted with each other and affected by the same factors such as roadway characteristics, traffic flow, and access management.One of the statistical approaches for handling the endogeneity problem is through the use of simultaneous equations models, while left-censored issue can be addressed by a tobit model.Therefore, the intent of the current paper is to address endogeneity issues with tobit model by incorporating the access management techniques so as to investigate the influencing factors of crash rates.

METHODOLOGY
The tobit model was first presented by James Tobin [14], and it was used to explain the range of dependent variable in regression model censored at a lower threshold (left-censored) or upper threshold (right-censored), or both.Censored data are different from truncated data because only non-limited values are available in truncated data while limited data information is provided as well in censored data [3].As for the crash rate, the data can be considered as left-censored at zero (zero crash per million vehicle miles travelled) in that not all the roadway segments include crashes during the observation period.Thus, the tobit model could be described as Expression (1): where N is the number of observations, Yi is the dependent variable (crashes per million vehicle miles travelled in roadway segment i), Xi is a vector of independent variables (access management techniques, traffic and roadway segment characteristics), b is a vector of estimable parameters, Xi b denotes the scalar product of two vectors, and i f is a normally and independently distributed error term with zero mean and constant variance 2  v .It is assumed to be an implicit, stochastic index (latent variable) equal to Y * i which is observed only when positive.The model's likelihood function over zero observation (0) and positive observations (1) can be expressed as, where U is the standard normal distribution function and z is the standard normal density function.More details about the model's expected value calculation and estimation are referred to in literature [3].To account for the endogeneity, the interdependence between crash rate, Yi (explained endogenous variable) and average speed, Xi (endogenous variable) the following model is presented in Expression (3): where Zi represents the factors influencing the crash rate, vector of exogenous variables, 2 a is the coefficient of the endogenous variables, 3 a is the coefficient of the exogenous variables, 1 a is a constant, and i f is a real number and denotes the error terms.
In order to deal with the left-censored limit of zero and endogeneity, the tobit model with endogenous variable is proposed with Expression (4): where N is the number of observations, Yi 1 is observed endogenous variable vector, Y * i 1 is unobserved endogenous variable vector, Xi 1 represents the vector of endogenous variables, Zi 1 represents the factors influencing crash rates, vector of exogenous variables, a is a coefficient of endogenous variables, 1 b is a coefficient of exogenous variables, i 1 f denotes the error term.
Generally, Maximum Likelihood Estimation (MLE) of censored regression model has been named the Tobit model, and for the tobit model with endogenous variable the equation system of this paper is a special case by Newey (1987) [10].Newey explored the more generic problem of endogeneity in limited dependent variable models (which include probit and tobit) with two-step estimation.The Amemiya's Generalized Least Squares (AGLS) estimator is proposed as a way to efficiently estimate the parameters of tobit when a continuous endogenous regressor is included.This has already become a standard way to estimate this model and is an option in STATA 10.0 when MLE is difficult to obtain.The main benefit of using this model is that it produces a consistent estimator of the standard errors and can easily be used to test the statistical significance of the model's parameters.More details about this model can be referred to Newey (1987) [10].

DATA
The target population in this study are the divided arterial streets in Las Vegas area, including the City of Las Vegas, the City of North Las Vegas, Clark County and the City of Henderson.The sample of this is composed of about19 major and minor arterials, including 356 roadway segments, in which 26 had no crash over a 3-year analysis period, and 330 had at least one crash.Figure 1 displays the locations of these selected arterials.
The target population is restricted to the divided arterial streets for two reasons: one is that the major streets attract most of the traffic volume of the city; and the other is that the unique geometric design of freeway, the crash type and severity on freeway are different from those of urban arterials.Since undivided medians cannot provide physical control to prevent vehicles from crossing over it, it is not an effective access management technique, so only Two-Way-Left-Turn-Lanes (TWLTL) and raised medians are included in this study.
For each mid-block segment, the data collected for this study include: the total number of crashes, travel speeds, access management techniques, traffic volume, land use and relevant roadway characteristics, such as the length of roadway segments and the number of lanes.
The crash data from 2003 to 2005 were collected from the Arc GIS database maintained by the Nevada Department of Transportation (NDOT).In this study, crash rates were used to evaluate the roadway safety.It is defined as the number of crashes per million vehicle miles travelled (MVMT), and can be calculated using Expression (5).
where CRseg = crash rates for midblock segment (in crashes per MVMT), NCR = number of crashes per year, Vseg = AADT of a roadway midblock segment, and L = length of a roadway midblock segment (in miles).
The traffic volume and travel speeds data from 2003 to 2005 were collected from the Arc GIS database provided by the Regional Transportation Commission (RTC) of Southern Nevada.The speed data for each midblock segment was derived from Expression (6): where Li refers to the length of a road section within a roadway segment, and vi r denotes the corresponding average speed for the road section i.It is noted that each midblock segment between signalized intersections may include several unsignalized road sections, and each road section may have different travel time, so the total travel time within the midblock segment is the sum of the travel time on all road sections.
For the main access management techniques, traffic signals, driveway spacing, median alternatives and median openings were collected and integrated by using Arc GIS and Google Earth.
The signal spacing in this study means actual midblock segment length, and can be expressed as the distance between signalized intersections (except the intersection functional area).Driveway spacing is expressed as driveway density, which in this study is defined as the number of unsignalized driveways that intersect on both sides of the segments divided by the length of the selected midblock segments, i.e. number of driveways per segment length, and then transferred as the number of driveways per mile.Median openings are expressed as median opening density.Similar to driveway density, in this study it is represented by counting the number of unsignalized median openings between medians, then divided by the length of the selected midblock segments, and then expressed as the number of unsignalized median openings per mile.Median alternatives include two main types in this study: TWLTL and raised medians, using 0 and 1 to stand for the types respectively.

Selected Sample Segments in Las Vegas
A large selection of roadway characteristics related to midblock segments is collected by using Arc GIS and Google Earth from the Las Vegas area.When the midblock segments are selected in Arc GIS, the number of lanes in both directions and the posted speed limit can be obtained; the data on the land use types (residential, commercial) along the midblock segments are read from Google Earth based on visual observations.The number of residential and commercial land parcels on both sides of the midblock segment is counted from the Google Earth, which is then divided by the segment length to derive the density for these two types of land use for this segment.
The crash rates and travel speeds are endogenous variables.The variables that represent the access management techniques for midblock segments are midblock segment length, driveway density, median type, median openings density, AADT and roadway characteristics, which were included as exogenous variables.Table 1 provides the description of these variables used in the modelling.

MODEL ESTIMATION RESULTS
In order to examine the endogeneity of crash rate and travel speed, the Durbin-Wu-Hausman (DWH) test was adopted [15].The null hypothesis was that parameters estimated with controlling for endogeneity were consistent and the alternative hypothesis was that parameters estimated without controlling for endogeneity were inconsistent.For the crash rate model, the null hypothesis was rejected at the 95% significance level because the DWH test statistic (356.0) was greater than the critical value (10.3, d.f.=8, p=0.05).Therefore, it was concluded that parameters estimated without controlling for endogeneity were inconsistent, implying that independent variables are endogenous.
A correlation test was conducted to identify variables to be included in the model.The results indicate that there are three coefficients that are greater than 0.5.It should be noted that different land use types may generally have a different number of driveways, i.e. residential lands usually have fewer driveways while commercial lands have more to make the customers convenient, which can be explained by the high correlation (0.8292) between land use type and driveway density.Since the median opening only exists on the roadways with raised medians, it makes sense that there is high correlation (0.7510) between the median type and the median opening density, implying the two variables would not be included in the model together.Also, two directional median openings and one directional median opening are part of total median openings -they are correlated with each other (coefficient of 0.4935 and 0.6355, respectively).
To correct the endogeneity problem, the tobit model with endogenous variable was estimated by comparing it with the tobit regression model.First, the endogenous variables (CRMVMT, AVGSP) were run with all exogenous variables (SEGLEN, DWDEN, MEDTYP, MEDOPDEN, TWODIRDEN, ONEDIRDEN, RESDEN, COMDEN, POSTSP, AADTLN), and then the insignificant exogenous variables were eliminated step by step at 95% confidence interval till all the variables are significant.Table 2 shows the final results with all the significant variables.Among all the influencing factors of both models, the traffic volume (AADT per lane) is the most significant variable (t-statistics=5.95 in tobit regression model and z-statistics=6.73 in two-step tobit model with endogenous variable) as shown in Table 2.In previous studies by Chin and Quddus (2003) [16] and Poch & Mannering (1996) [17], traffic volume was also found to be an important measure for crashes.The coefficients for the variable AADT per lane in two models are both negative, which means when there is more traffic volume on each lane, the vehicles on the segments tend to run at lower speed, thus reducing the crash rates.
Another significant influencing factor for both models is the average travel speed.As shown in Table 2, the crash rates are positively influenced by the travel speed, which indicates that higher travel speeds may increase the crash rates.This is in compliance with the reality experience.If the vehicles are run at very high speeds, the chances of running into conflicts with other vehicles or roadway facilities are increased, so the crash rates on the roadway segments are raised.More important is that the travel speed is considered as an endogenous variable in two-step tobit model, and there is interdependence between crash rates and travel speed with the endogeneity test.
The median type reveals the significance of influencing crash rates in both tobit models.TWLTL and the raided median are included in this study, and the negative coefficients in two tobit models show that crash rates are reduced if the raised median is installed instead of TWLTL on roadway segments, which can be verified by the related studies by Gluck et al. (1999) [18] and ACM (2003) [19].
Different from the studies by Xu (2010) [5] and Xu et al. (2012) [6], the results in Table 2 show that the two-directional median opening density is positively significant for the crash rates, which indicates that two-directional median opening has larger impact on crash rates, and more crashes occur where there are more two-directional median openings.Because the turning vehicles via the median opening conflict with the through traffic on the midblock segment in two directions, the chances of running into crashes tend to be higher than those of the other two median opening types.
Compared to the tobit regression model, the segment length is significant to the crash rates, i.e. signal spacing affects the crash rate significantly.Since the The variable driveway density reveals that it is positively related with crash rates.More driveways along the segment may cause more conflicts between through and turning vehicles, which lead to more crashes, so the crash rates are increased as verified by previous studies by Gluck et al. (1999) [18], ACM (2003) [19] and Schultz (2007) [20].
In the tobit model with endogenous variable, the coefficient for the median opening density is positive which implies that more crashes occur on midblock segments that have more median openings.The result from another side indicates that the number of median openings increases the conflicts between turning and through vehicles, and leads to crashes eventually.
The variable posted speed limits are statistically significant and have positive coefficients for tobit model with endogenous regressor, implying that vehicles on midblocks with higher speed limits tend to drive at higher speeds, thus leading to more crashes, which is consistent with intuition.

CONCLUSION
In this study, a two-step tobit model with endogenous variable was developed for midblock segments to identify the influencing factors of crash rates.The interdependency between crash rates and travel speeds on mid-block segments, and the left-censored problem were addressed: the methodology presents that there is an endogeneity between crash rates and travel speed, and the results verify this interdependence; crash rate is regarded as left-censored at zero by including the roadway segments with and without crashes during the observation period, and the tobit model explains this issue, which is estimated by a two-step method instead of conventional maximum likelihood.From the results, it can be concluded that the access management techniques, signal spacing, driveway density, median type, median opening density and two-directional median opening density are statistically significant factors that influence crash rates on midblock segments.The longer the distance between signals, the fewer driveways and median openings, the fewer crashes are likely to occur.In addition to these access management techniques, the posted speed limit and AADT per lane influence crash rates on midblock segments.Moreover, fewer crashes occur on midblock segments with raised medians than on those with TWLTLs, and two-directional median opening density is more significant to crash rates than other median opening types.
Based on the results, longer midblock segments are recommended for future development.The num-ber of driveways on midblock segments should be controlled, and designed with better circulation systems in the adjacent lands.If possible, the number of median openings, especially two-directional median openings should be restricted in case that it lowers the travel speeds.It is also recommended that additional study be conducted to include variables such as geometric designs, weather conditions, light conditions, drivers' conditions, and vehicle types etc., which all have influence on roadway crash rates and travel speeds.By having them included in the modelling, the accuracy of the models can be improved.

Table 1 -
Descriptive statistics for modelling variables

Table 2 -
Results from two Tobit models .Xu, A. Kouhpanejade, Ž. Šarić: Analysis of Influencing Factors Identification of Crash Rates Using Tobit Model with Endogenous Variable longer midblock segment may avoid the unnecessary stop-and-go, thus preventing the potential conflicts, it can decrease the crash occurrence accordingly. X