IDENTIFICATION AND PRIORITIZATION OF HAZARDOUS ROAD LOCATIONS BY SEGMENTATION AND DATA ENVELOPMENT ANALYSIS APPROACH

The aim of the present study is the representation of a method to identify and prioritize accident-prone sections (APSs) based upon efficiency concept to emphasize accidents with regard to traffic, geometric and environmental circumstances of road which can consider the interaction of accidents as well as their casual factors. This study incorporates the segmentation procedure into data envelopment analysis (DEA) technique which has no requirement of distribution function and special assumptions, unlike the regression models. A case study has been done on 144.4km length of Iran roads to describe the approach. Eleven accident-prone sections were identified among 154 sections obtained from the segmentation process and their prioritization was made based on the inefficiency values coming from DEA method. The comparisons demonstrated that the frequency and severity of accidents would not be only considered as the main factors for black-spots identification but proper rating can be possible by obtaining inefficiency values from this method for the road sections. This approach could applicably offer decision-making units for identifying accident-prone sections and their prioritizations. Also, it can be used to prioritize intersections, roundabouts or the total roads of the safety organization domain.


INTRODUCTION
Rural accidents constitute a significant proportion of total accidents.Iran Statistics shows that rural casualties allow for more than 69 percent of accident fatalities [1].Accordingly, scientific resources and their literature are replete with discussions about the reduction of damage and traffic accident impacts.Aimed and systematic reduction of accidents needs a comprehensive safety management.Identification of black spots, sometimes known as hazardous road location (HRL), high risk location, hotspots, accident-prone situations, etc, is the first step in the road safety management process [2].Many definitions of accident-prone spots are available though research emphasizes that there is no comprehensive definition of what is accepted as a hazard [3].An accident-prone spot is defined as any place with a higher number of accidents compared to other similar spots due to local risk factors [4].This definition refers to the concept that accidentprone spots are situations which are substantially affected by geometric design and traffic factors in accidents and that they would be reduced by means of engineering counter-measures.The identification of accident-prone spots represents a list of spots being prioritized for further engineering studies which can distinguish accident patterns, potential resolution, and effective factors [5].Moreover, in these processes cost-effective projects are often chosen to obtain the best results from limited resources [2,6].
In order to introduce more dangerous places, it is necessary to quantify the risk status.The frequency of accidents occurred on road sections (per year or kilometre-year) is the simplest measure of risk.Other simple measures are accident rates, such as the number or cost of accidents per vehicle-kilometre or per registered vehicle.The use of these criteria may cause great errors due to stochastic changes in accidents from one year to another [7].Another way of introducing black spots is by applying statistical models such as Poisson model, Negative Binomial, Generalized Negative Binomial and Zero Inflated Negative Binomial, Log-Normal Poisson, Experimental Bayesian model, Hierarchical Bayesian model, etc.These models have been used to calculate the frequency and severity of accidents for different temporal and spatial patterns.With the use of these models prioritization based on the potential to reduce accident risk and to find the points with highest hazard is possible [8,9,10,11,12].The Statistical Confidence interval method for understanding the significance of a point's risk compared to the average value [7,13], identification based on the specific type accident (such as turning, sweeping, etc.) [14,15] and composition of severity, frequency or risk potential with each other [16,17] are other methods to recognize the black spots.The comparison of the methods to identify black spots is another aspect of research in this field, and several researchers have compared some of these methods based on different criteria [2,7,10,18].The regression method needs a mathematical function which estimates dependent variable with the aid of independent variables.This function requires some assumptions about the distribution function data and model limitations.This method usually relies on just one output parameter of safety.
In this research a new approach has been introduced to identify accident-prone sections (APS).One of the advantages of this approach over previous studies is APS-based instead of the spot-based one.Since an interaction of several factors could lead to a crash on the road section, therefore, considering a section in lieu of a spot is more rational.The approach has been carried out using Data Envelopment Analysis (DEA).DEA does not require obtaining and considering any distribution function and related assumptions.The approach makes it possible to evaluate inputs such as geometric features and roadside components to outputs in comparison with their optimal performances.In other words, the index to compare road sections considered accident-prone is a ratio of combined accidents to combination of factors affecting accidents.This index is not a simple proportional ratio and it is computed based on DEA approach.DEA has been de-veloped by Charnes et al. [19] as a tool to examine relative efficiency of decision-making units (DMUs) based upon the produced output and their consumed input data.Using this method, the relative efficiencies of units are calculated and efficient and inefficient units are then determined.In this paper firstly, the length of 144.4kmKhorasan Razavi roads in Iran, is divided into homogenous sections, and then using DEA high APSs are determined.

THEORETICAL FOUNDATION OF DEA
Efficiency measurement due to its importance to evaluate agency or organization performance is always in the focus of scientists' attention.Farrell in 1957, using a method such as efficiency measure, in terms of engineering topics, attempted to measure the efficiency for productive units [20].
Charnes et al. [19] developed Farrell viewpoint and presented a model which is able to measure the efficiency with multiple output and input.This model became known as Data Envelopment Analysis (DEA).Since this model was presented by Charnes, Cooper and Rhodes, it was known as CCR model.This article attempts to measure and compare relative efficiency of decision-making units such as schools, hospitals, banks and their branches and other similar cases containing the same multiple input and output [19,21].The CCR model uses the ratio of weighted output to weighted input as a scale to measure efficiency, if each unit contains m input to produce s output, then, the fractional form of classical model of Data Envelopment Analysis, which studied the performance of the unit in question, will be as follows [21,22,23]: . 1 , ; , , ; , , where The DEA method can distinguish units between efficient (efficiency equal to one) and inefficient (efficiency less than one) units.Using this method, inefficient units are ranked unlike efficient units [19,24].In order to rank efficient units, the researchers have presented different methods.Andersen and Petersen [25] offered a method (AP) to rank the efficient units.By this method it is possible to give the efficient units a score greater than "one".As result, an overall ranking is obtained for efficient and inefficient units.In this research the AP method is applied in the way that the unit whose efficiency value is equal to 1 out of the CCR model solving, by eliminating the restrictions relating to the considered unit from the total constraints CCR model is resolved.

METHOD
The aim of this paper is to compare road sections regarding accident-prone ones by means of DEA.Measuring of relative efficiency of units by data envelopment analysis is built on the produced outputs and consumed inputs.The efficiency is the ratio of inputs to outputs.In this study, the number of accidents concerning their severities was considered as the section output and effective factors on accidents as inputs of that section.
Applying DEA requires developing DMUs with the same performance.DMUs are road sections; in this paper, road segmentation approach has been done to develop DMUs.
In section 3-1 effective factors on accidents are reviewed and section 3-2 describes how to gather data for a case study.In the following, road segmentation methodology and identifying homogeneous sections, effective factors on accidents (model inputs), accidents criterion (model output) and usage of data en-velopment analysis to identify and compare APSs are explained.

Effective factors on occurrence of accident
Identification of black-spot sections requires understanding of factors influencing the occurrence of accident.However, in this discussion factors to be noted are those which are location-dependent.Therefore, the factors such as weather condition, vehicle type and driver status are not considered.On the basis of previous studies [26,27,28,29,30,31,32], properties which can be considered to evaluate the performance of road safety are: average annual daily traffic (AADT), curvature (length and radius), tangent length, crosssection characteristics (lane width, shoulder width), accesses density, roadside hazards, sight distance, road gradient, pavement condition, speed limit, etc.Some researchers [33,29,34,35,36] introduce other various factors such as difference between operating speed and design speed, the operating speed differentials between successive road sections, the difference between provided and required side friction coefficients, the difference between operating speed profile and average operating speed, driver workload, etc.Such factors are always descriptive and are often indicators of geometric design consistency.Thus, to identify the APSs it is necessary to collect the mentioned data.The data were gathered to the extent which was possible and is described in the following section.

Data collection
In general, the required information includes road and traffic characteristics and data of accidents.The survey was conducted on a sample of 144.4km of twoway two-lane roads located in Khorasan Razavi state of Iran which includes Mashhad-Kalat and Mashhad-Fariman roads.Road plans were received from the Mashhad Roads and Transportation Department.Unfortunately, there were slight changes over the years which were not recorded.Hence, the survey was performed using GPS equipment in cinematic mode to collect information of alignment and compare with plans.These surveys were performed by driving on the far right lane of road at a moderate speed of 60km/h; fortunately, changes in the horizontal alignment of these roads were limited to widening some segments which are considered in other field inspections.Also, data about roadside hazard, number of accesses, sections with speed limit, and pavement condition index were gathered by experts in the field surveys.Because of inaccessibility to longitudinal profiles of route maps information of gradient and vertical arcs were not considered.Also, the operating speeds on road sections were not considered due to the lack of measuring equipment.Accident data were prepared by the Department of Khorasan Razavi Transport and Terminals and Bureau of road police of Khorasan Razavi state.Unfortunately, only in 2004 and 2005 accident data of province routes had been prepared by location references and the severity of incidents while in the following years no such information by the location was available.

Segmentation of routes
So far, some researchers have tried to estimate accident models using the approach of road segmentation [26,28,30] but they often define road segments with fixed length or simply between two main intersections.Abdel-Aty and Radwan [26] segment road to homogeneous sections in terms of geometry (degree of horizontal curvature, shoulder and median width, lane width, etc.) and traffic flow.Also, Cafiso et al. [29] define a comprehensive segmentation method based upon a combination exposure, geometry, consistency, and context variables related to safety performance and modelled accident occurrence.In this paper, identification and segmentation of homogeneous parts of roads is done based on accident factors.A number of accident factors that can be used for this purpose are: -average annual daily traffic (AADT); -shoulder and lane width; -speed limits; -curvature change rate (CCR); -pavement condition.
AADT of each road is available and the beginning / end of sections with specific speed limit or lane width change can be determined or measured with field inspections.Curvature change rate is determined from road plan characteristics and defined for each section as follows: where i c is the deflection angle for the i th curve within a section of length L. To obtain sections with homogeneous CCRsec, cumulative deflection angles can be drawn as a function of distance from the start of route and then the smoothly trend lines would be fitted.The slope of the drawn line is CCRsec value for each part [37].This definition is shown in Figure 1 based on a sample of gathered data for this study.Anastasopoulos et al. [38] indicate the pavement condition in terms of driving quality and skid resistance effective on the accident occurrence.In this study, roads were divided into homogenous parts based on the present serviceability rating (PSR) method.Rating has been done based on AASHTO method [38].Because of inaccessibility to friction measurement devices, this factor has been withdrawn.
Based upon any of the above mentioned factors change, a new homogeneous road sections can be defined.In other words, by changing each factor in value, a new section starts.-section length; -curvature ratio; -tangent ratio; -roadside hazard index; -access density; -no passing zones ratio; -distance ratio from population centres located at the beginning and end of the route.Section length is calculated in the segmentation process.Using road geometric plan, curvature ratio (CR) and tangent ratio (TR) are calculated as follows [30,37]:

Inputs in DEA model
where LHS is the length of section (km); LCj is the length of j th curve in section included M curves (km); LTe is the length of e th tangent in the homogeneous section composed by N tangent (km).
Cafiso et al. [28,29] introduce Roadside Hazard rating for use in 200m road segments.In this index, a score (0 = no present, 1 = low risk, 2 = high risk) to 5 items of roadside hazards (embankments, bridges, dangerous terminals and transitions, trees and other rigid obstacles, ditches) are allocated by inspector for both directions separately.Then the weighted average of five items is calculated: where K is the direction of inspection (left and right sides); Scoreijk is the assigned score (0, 1, 2) by inspection in the i th unit along direction k; Weightj is the relative weight of the j th roadside item based on AASHTO severity indices [39] which are 3 for embankments, 5 for bridges, 2 for dangerous terminals and transitions, 2 for trees and other rigid obstacles and 1 for ditches.Thus, the risk of roadside using designed checklist is evaluated by safety inspectors for the 200-metre parts and then the average value for each homogeneous section is considered as a hazard index.
Access density and No passing zones ratio are calculated by dividing the number of access roads and total length of no passing zones by section length, respectively.
Since distraction factors, the volume of traffic flow, and traffic turbulence are higher in the vicinity of cities owing to concentrated industrial or recreational centres and different land uses, therefore, a number of studies [40,41] show that the occurrence of accident is commensurate with logarithm of the distance from the city.Hence, for assigning different importance to cities located at the beginning and end of the road, an index related to population centres has been defined as:

Output in DEA model
The output in DEA models for each homogenous section is the number of accidents.But in addition to accident frequency, another factor in identifying a location as the APS would be the severity of occurred accidents.The researchers in their studies have mentioned different coefficients for the importance of accident severity (property damage, injury or fatal accident).For example, the Ministry of Flemish Community [43] use weighting values of 1, 3 and 5 for the property damage, injury or fatal accident, respectively, While in Portugal the weighting values of 1, 10 and 100 are used for accidents with slight injuries, accidents with serious injuries and fatal accidents, respectively [18].
The Road and Transportation office of Khorasan Razavi uses the coefficients 1, 3 and 5 for property damage, injury or fatal accident, respectively, to identify black spots and if a point earns a total score higher than 30, it is considered as a black spot.By examining various relations and with respect to accident reporting and safety culture in Iran, Yazdani [44] emphasizes the usage of the same ratio 1, 3 and 5 for property damage, injury or fatal accidents.In this study, accident index is calculated using these coefficients as the output for DEA model for each homogeneous section.

RATING AND PRIORITIZING OF ROAD SECTIONS
To study the two-way two-lane parts of Mashhad-Kalaat and Mashhad-Fariman in Iran, the length of 144.4km was considered.In general, on these selected roads, 154 homogeneous sections -the longest section being 5km and the shortest section of 0.15km -were obtained.Hereafter an assumption is made in which each of 154 sections defined in section 3-3 are considered as DMUs.Also, in the DEA model the input or output directions have to be the same.In other words, in the viewpoint of inputs, it is of importance to know: the lower the better or the higher the better.Therefore, Pearson correlation tests were ap-plied between inputs and outputs to assess the direct or indirect relationship of each input with output.Table (1) shows the correlation results.As can be seen, only the curvature ratio, proportion of no passing zone and curvature change rate are negatively correlated.In order to resolve this problem, inverse interpretation of these rates were incorporated into DEA model as (1-x), where x is each of those negatively correlated inputs.So in DMUs, the input variables are: x1: section length, x2: 1-curvature ratio, x3: tangent ratio, x4: pavement condition, x5: shoulder and lane width, x6: speed limit, x7:access density, x8: 1-proportion of no passing zones, x9: roadside hazard index, x10: 1-curvature change rate, x11: section distance index from two hubs, x12: AADT whereby output variable is y1: weighted values of accident by severity (1, 3 and 5 bearing with damage property, injured and fatal accidents, respectively).
An example of the variables value is shown in Table (2).For example, value of variables for section 1 in Mashhad-Kalat route are: 1,000m section length, 0.208 curvature ratio, 0.427 tangent ratio, 3.8 PSR, 7.3 shoulder and lane width, 80km/h speed limit, 0.003 access density, 0.101 proportion of no passing zones, 4 roadside hazard index, 0.009 section distance index from two hubs, 1,500 AADT and 32 for weighted accident index.
Since the occurrence of accidents is an unfavourable factor, for each section an inefficiency index is defined instead of the efficiency index.Inefficiency value (score) per unit is calculated using the CCR model for each road section (DMUs).Next, units with score of 1 are ranked using the AP method and new scores are calculated for them.Based upon obtained results, road sections with the highest inefficiency values will be considered as accident-prone sections and in consequence the prioritization of road sections would be possible.
The programming of CCR model is done by using spreadsheet software and it is used for calculating the inefficiency of road sections.Figure 2 shows the obtained results of inefficiency of 154 road sections calculation using this method.Then, sections with inefficiency of 1 are ranked using the AP model.The results of these sections are shown in Table (2).For example, the inefficiency value or hazard potential of section 1 in comparison with other sections is obtained as 1.768 which can be located in the fifth priority for safety treatment or rehabilitation.Note that the inefficiency value higher than 1 is not necessarily the indicator of APS, but rather the identification of a section as an APS depends on the level of allocated funds for safety treatment.
Table (2) shows the sections with inefficiency values of 1 or above.Among these 11 sections, 1 section is from Mashhad-Fariman route and others are from Mashhad-Kalat route.Also, for the sake of comparison of Mashhad-Fariman and Mashhad-Kalat routes, the average of their inefficiency values which are 0.37 and 0.25 for Mashhad-Fariman and Mashhad-Kalat routes are calculated, respectively.These results indicate that Mashhad-Fariman route is more critical than the one of Mashhad-Kalat from the viewpoint of hazard.
Figure 3 illustrates the inefficiency values corresponding to the weighted values of accident by severity (criterion of Khorasan Razavi Road and Transportation Office) for the sections.In order to survey the proposed method, the weighted value of accident for each section as employed by road and transportation administration was calculated.The previous method presented sections with weighted values higher than 30 as black spots while according to the proposed method the sections with higher inefficiency values are more accident-prone.Although a number of sections in both approaches was considered as black spot, the sections such as number 144 and 132, despite having high weighted accident values show low inefficiencies indicating their proper performance based upon the properties compared with others.On the contrary, in sections 48 and 147 the weighted values of accident are low and the inefficiencies are high, which proves that their relative performances regarding the properties are inappropriate, and that they should have had a lower number and of severity accidents.Moreover, in the previous approach a sec- tion such as 6 may be rehabilitated and again identified as a black spot in the next years, but section such as 3 may be never present as a black spot whereas it can be considered by means of the proposed method, and subsequently by the expense of slight cost accidents it would be decreased significantly.The evaluation and reliability of this method to identify the black spots can be assessed by economic evaluation of the proposed method benefits in comparison with other methods.

CONCLUSION
One of the methods in managerial decision-making is the usage of quantitative models.Effective results of these methods in planning, is led to confidence of decision-makers.Consequently, this paper presents a new method for affecting environmental, traffic and geometric characteristics to identify black spots, so that the road is segmented to the units or parts with homogeneous characteristics and decision making is performed for each unit on the basis of its specific features.This method considers the accident according to interaction of its causal parameters.Also, this method instead of points, introduces lengths of route with known specifications for which improvements can be done in the specific intervals.
Comparison of road sections using linear programming in the framework of envelopment analysis method provides a method which can be used to prioritize the road sections, intersections, roundabouts or the total roads of a safety organization domain.In the present study the relative inefficiency of 154 sections, which is considered for prioritizing road sections, were obtained, which is a new experience in terms of input and output indices, based upon DEA method.Proper Decision-making unit characteristics which are effective on the output are data envelopment analysis direction (centesimal degree)A.Sadeghi et al.: Identification and Prioritization of Hazardous Road Locations by Segmentation and Data Envelopment Analysis Approach model inputs.These inputs include factors used for segmentation and other features which are calculated for each section separately.These features include:

where
Da and Db are distances of the centre of the section to the beginning and the end of road (cities a and b); Pa and Pb are population of cities a and b (in the case study, the population of each city has been obtained from the results of Population and Housing Census in 2006 [42].

Figure 2 -Figure 3 -
Figure 2 -Inefficiency values of road sections A. Sadeghi et al.: Identification and Prioritization of Hazardous Road Locations by Segmentation and Data Envelopment Analysis Approach * and u *

Table ( 1
): Pearson correlation values between output and inputs .Sadeghi et al.: Identification and Prioritization of Hazardous Road Locations by Segmentation and Data Envelopment Analysis Approach A