CONTRIBUTION TO ACCIDENT PREDICTION MODELS DEVELOPMENT FOR RURAL TWO-LANE ROADS IN SERBIA

Over the last three decades numerous research efforts have been conducted worldwide to determine the relationship between traffic accidents and traffic and road characteristics. So far, the mentioned studies have not been carried out in Serbia and in the region. This paper represents one of the first attempts to develop accident prediction models in Serbia. The paper provides a comprehensive literature review, describes procedures for collection and analysis of the traffic accident data, as well as the methodology used to develop the accident prediction models. The paper presents models obtained by both univariate and multivariate regression analyses. The obtained results are compared to the results of other studies and comparisons are discussed. Finally, the paper presents conclusions and important points for future research. The results of this research can find theoretical as well as practical application.


INTRODUCTION
Reducing highway accidents has always been one of the most important tasks for traffic engineers. Estimating the number of accidents resulting from a given highway design is very important in evaluating different design alternatives. Therefore, it is important to understand the relationship between accidents and the characteristics of roadways in order to try to reduce the number of accidents [1]. Accident prediction models have been very useful in estimating the expected number of accidents on intersections and road segments. Essentially, an accident prediction model is a mathematical relationship that expresses the average accident frequency of a site as a function of traffic flow and other site characteristics [2].
Road safety modelling has attracted considerable research interest in the past four decades because of its wide variety of applications and important practical implications. Public agencies, such as State Departments of Transportation, are interested in identifying accident-prone areas to promote safety treatments. Similarly, transportation engineers are interested in identifying factors (traffic characteristics, geometric characteristics, etc.) that influence accident frequency and severity to improve roadway design and provide a safer driving environment. High cost of highway accidents paid by societies around the world makes highway safety improvement an important objective of transportation engineering. A significant number of previous studies have indicated that improvements to highway design could produce significant reductions in the number and severity of crashes [3]. Therefore, in the development of feasibility studies it is necessary to properly determine the impact of road and traffic characteristics on traffic safety impact resulting from a specific design solution.
Previous studies utilized different methodologies for accident prediction on highways [4]. Joshua & Garber [5] pointed out that linear regression models do not adequately describe the nature of crash frequency data. Poisson or negative binomial regression models are better suited for defining the random, discrete, and non-negative nature of crash occurrences [6].
Accident prediction models have been used elsewhere as a useful tool by road engineers and planners [7]. Fletcher et al. [8] found that due to wide differences in traffic mix, road quality, design and road user behaviour, it would be neither valid nor useful to apply simple multiplicative factors or even devise more complex conversion formulae for models developed elsewhere for another country. The regression models developed for certain conditions, which can vary regionally from country to country, cannot be generalized for all countries, which may have different standards in geometric designs and may create various operational environments for traffic under their jurisdictions.
The roadmap of this paper can be summarized in the next few steps. First, the paper provides a comprehensive literature review of the former research of this issue. Special attention is given to several factors which define the quality of similar studies, such as: sample size used by most of the other models, applied methodologies, relationships in the previously developed models, and the variables which are most frequently included in the models. Second, the paper describes procedures for collection and analysis of the traffic accident data, as well as the methodology used to develop the traffic accident prediction models. Third, the paper presents models obtained by both univariate and multivariate regression analyses. Fourth, the obtained results are compared to the results of other studies and comparisons are discussed. Finally, the paper presents conclusions and important points for future research.

LITERATURE REVIEW
Although there have been many studies that investigate an important research topic of traffic safety, the focus here is only on a selected number of those which were found to be the most relevant for the research issue at hand.
One of the earliest studies to investigate relationship between the number of accidents and geometric characteristics of highways [9] used generalized linear regression to develop accident prediction models as a function of road characteristics for two-lane rural roads. The researchers investigated impact of the different variables on injury traffic accidents for a period of eight years.
Zegeer et al. [10] developed accident prediction models on the basis of almost 5,000 miles of two-lane rural roads in seven US states. They examined the influence of the following variables: annual average daily traffic (AADT), horizontal curvature, vertical curvature, side-slope length and ratio, lane width, shoulder width and type, number of bridges, intersections, overpasses and railway crossings, number of driveways and driveway types, type of delineation and on-street parking, roadside hazard rating, and roadside recovery distance.
Kalakota & Senevirathe [11] developed accident prediction models for two-lane roads, using the accident data for a period of four years on a two-lane twoway rural road of over 85 kilometres in northern Utah. Three separate models were calibrated based on the given data: one for tangents (143), one for curves (153), and one for combined tangent and curve sections (296).
Hadi et al. [12] used a negative binomial regression analysis to assess the impact of the cross-section design elements of two-lane rural roads on total, fatality, and injury accident rates for a period of four years. The sample size (number of sections investigated) was not provided but independent variables were the following: section length, AADT, lane width, shoulder width, speed limit, and the number of intersections.
Mountain et al. [13] developed models for the prediction of expected accidents on the United Kingdom roads with minor junctions, where traffic counts on the minor approaches were not available. Models for rural two-lane roads were developed on the basis of data on 2,566 kilometres of two-lane rural roads. Two types of models were developed. In the first model the influence of minor junctions on accident frequency of the section was considered by including the minor junction density as an explanatory variable while in the second model the minor junction accidents were modelled separately.
Vogt & Bared [14] investigated the dependence of the number of traffic accidents on various geometric and functional characteristics of two-lane rural roads. They used a database that contained traffic accident data for five years for 619 sections in Minnesota, and for three years for 712 sections in the state of Washington.
Harwood et al. [15] defined an algorithm for predicting traffic accidents on the main sections of two-lane rural roads. The accident prediction algorithm consists of the base model and the modification factors. Nine modification factors were developed in this research.
Within the effort to improve the safety of horizontal curves on rural two-lane roads of Portuguese national road network, Cardoso [16][17] defined accident prediction models for curves and tangents. Separate models were developed for roads with unpaved and paved shoulders. This research examined the following variables: curve radius, bendiness, the average gradient, carriageway width, shoulder width, AADT, the approach speed, speed reduction on the approach of a curve, and the difference between the 85 th and 15 th percentiles of speed distribution on a curve.
Mayora et al. [18] developed accident prediction models using a sample of 3,450 km two-lane rural roads in the regions of Valencia and West Castile, Spain. Traffic accidents were analysed for a period of five years.
Qin et al. [19] stressed the importance of developing separate models for the prediction of traffic accidents for different types of accidents. The traffic accidents in the four-year period were analysed for over 29,800 segments (equivalent to 29,000 km) of two-lane roads. For each segment, type of traffic accident, segment length, AADT, speed limit, lane width, shoulder width (right and left) and paved width of the road were collected. Zero-inflated Poisson modelling was used to develop a prediction model.
Fitzpatrick et al. [20] developed accident prediction models for two-lane rural roads using data on 3,944 miles of two-lane rural roads in Texas. The analysis primarily investigated traffic accidents caused by the carriageway width for a period of three years. Variables that were examined in this study included lane width, shoulder width, section length, and AADT.
Cafiso et al. [21] conducted a survey on a sample of 168.2 km two-lane rural roads in Italy. The study examined the influence of the following variables: bendiness (radius, length), tangent length, cross-section elements (lane width and shoulder width and type), access density and roadside hazard rating. A total of 107 homogeneous segments were analysed over a period of five years.
Using crash data for rural, two-lane highways in Minnesota, Geedipally et al. [22] estimated the proportion of crashes by collision type that occurred during the 5-year period between 2002 and 2006. The results showed that head-on crashes were affected by AADT, truck percentage and shoulder width.
Chapter 10 of the Highway Safety Manual [23] offered a definition of the accident prediction model for rural two-lane two-way roads. Within the model for predicting the accident number, the influence of the traffic volume (AADT) on the frequency of accidents was included through SPF (safety performance function), while the influences of project geometry and traffic management characteristics were included through CMF (modification factors). The total number of 12 modification factors was developed.
Ackaah and Salifu [23] developed a prediction model for road traffic crashes occurring on 76 rural sections of the highways in the Ashanti Region of Ghana. The model was developed for all injury crashes occurring on the selected rural highways in the Region over a three-year period. The data collected for each section comprised injury crash data, traffic flow and speed data, and roadway characteristics and road geometry data. The Generalised Linear Model (GLM) with Negative Binomial (NB) error structure was used to estimate the model parameters.
Dinu and Veeraragavan [24] attempted to employ random parameter modelling to develop accident prediction models for Indian two-lane undivided rural highways that operated under mixed traffic conditions. Three years of accident history, from nearly 200 km of highway segments, was used to calibrate and validate the models. The explanatory variables considered for modelling included hourly traffic volume, length of the highway segment, proportion of buses, cars, motorized two-wheelers and trucks in the traffic, driveway density, shoulder width, and horizontal and vertical curvatures.
Turner et al. [25] developed accident prediction models for two-lane rural roads using data for 6,829 km of state rural roads in New Zealand. The sample was divided into tangents (17,087) and curves (13,490). The developed models quantified relationship between traffic accidents and traffic volumes, road geometry, cross-section, road surfacing, roadside hazards and driveway density. Generalized linear accident prediction models were developed for key accident types, including head-on, loss-of-control and driveway accidents.
Deublein et al. [26] proposed a novel methodology to determine models that can be used to predict the number of injury accidents and injury severities of road users that occur on roads, where no or little data exist for the specific road segment in question. The methodology utilizes a combination of three statistical methods: (1) gamma-updating of the occurrence rates of injury accidents and injured road users,(2) hierarchical multivariate Poisson-lognormal regression analysis, and (3) Bayesian inference algorithms. The risk-indicating variables are selected taking into consideration traffic characteristics and the design parameters of the road, such as traffic volume, traffic composition, speed, curvature and the number of lanes.
Hosseinpour et al. [27] intended to identify the factors affecting the frequency of head-on crashes that occurred on 448 segments of five federal roads in Malaysia. Data on road characteristics and crash history were collected on the study segments during a four-year period. The variables horizontal curvature, terrain type, heavy-vehicle traffic, and access points were found to be positively related to the frequency of head-on crashes, while posted speed limit and shoulder width decreased the crash frequency.
To summarize this literature review, we considered linear regression models (depicting relationships between traffic accidents and operational and geometric characteristics of two-lane rural road segments) from a variety of countries around the world. It has been concluded that the following independent variables are the most frequently used in such models: AADT, the percentage of heavy goods vehicles, the radius of horizontal curve, bending, gradient, superelevation, carriageway width, lane width, shoulder type, shoulder width, paved shoulder width, unpaved shoulder width, access points density, roadside hazard rating, average roadside recovery distance, speed limit, sight distance, the share of no passing zones, the coefficient of skidding resistance, and the depth of road surface texture. Generally, AADT is the most frequently used variable in these models and the one with the greatest predictive power in most of the models. The influence of most of the other variables has been studied by using modification factors.
When it comes to dependent variables, the following ones have been most frequently used: the number of traffic accidents, the number of traffic accidents per year, the number of traffic accidents per kilometre per year or the number of traffic accidents per million vehicle-kilometres. Apart from the total number of traffic accidents, some of the researchers studied separately the impact on the fatal traffic accidents and accidents with injuries.
Conclusively, many studies focused on the relationship between traffic accidents, and operational and geometric characteristics of two-lane rural roads. As far as South East Europe (SEE) region is concerned, the only research originates from 1970, when Ivanović [28] analysed the influence of road elements on the consequences of traffic accidents in Bosnia and Herzegovina during the period of nine years. The contribution of the research present here is that it fills in a long time gap in developing traffic accident prediction models as a function of road and traffic characteristics in SEE region. The study is highly justified considering that previous research has shown that different models and prediction factors have been developed for various regions/countries around the world. Moreover, the previous research informs that the relationship between accidents and road and traffic conditions has an idiosyncratic component. The contribution of this paper is a definition of the methodology for developing accident prediction models in Serbia. First, the paper offers guidelines for creating an integrated database. Second, this research defines the procedures for dividing the sections into homogeneous segments. Third, this research identifies the variables whose influence should be examined, and finally it develops the models which represent the first findings for Serbia.

Modelling approach
The basic hypothesis is that road geometry and traffic characteristics have an impact on the number of traffic accidents for each individual road section. A statistical modelling approach was used to investigate the basic hypothesis i.e. to investigate the relationship between the number of traffic accidents and various potential influencing factors. The following steps were conducted in the development of regression models: 1) Databases selection -database was selected based on the availability of road, traffic and accident data. 2) Data integration -data from various sources were integrated and the road sections were divided into homogenous segments, characterized by the corresponding data. 3) Data analysis -data were analysed through the development of regression models. 4) The TableCurve software package was used for the statistical analysis of the data. Univariate and multivariate regression analyses were applied to investigate the impact of various independent variables on the dependent variable (number of accidents). The quality of regression was assessed by coefficient of determination R 2 .

Data collection and synthesis of data
In order to develop the traffic accidents prediction model, it was necessary to collect the data on traffic accidents, the data on traffic volume, and road geometry. Following sections describe database sources.
The data on traffic accidents were obtained from the database maintained by the Serbian Road Traffic Safety Agency. The database contained the data about the type of traffic accidents, time-space distribution of traffic accidents, conditions of pavement at the time of the accident, and the main cause of the accident. The data on the spatial distribution of traffic accidents were very important for the needs of this research. Data were collected for the period of three years, between 2010 and 2012. In this period, and on the segment studies in this research, there were 1,140 traffic accidents, 422 of which included injured persons and fatalities. The database included data for the two-lane state road IA and IB rank (roads marks IA-2 and IB-15).
Traffic volume and traffic composition data were retrieved from the database of the Public Enterprise "Roads of Serbia". The traffic database contained the data on AADT, including both total volumes and volumes of various basic vehicle categories.
The data on road geometry were also taken from the database of the Public Enterprise "Roads of Serbia". This database consisted of several tables with separate records for the road cross section and horizontal and vertical route alignments. The unit of observation in each database was a road segment, with homogenous geometric, traffic volume, and other conditions.
Three individual databases were merged into an integrated database, which contained the data on traffic accidents, traffic volume, and road geometry. The road sections were divided into homogenous segments according to traffic characteristics (AADT and percentage of commercial vehicles), cross-sectional characteristics (lane width) and geometric characteristics (radius and grade). In addition, the borders between the segments were set at the first next place, where at least one of the independent variables was altered. Thus, the entire database record consisted of 344 homogenous road segments, which constituted 213.8 km of two-lane roads.

ACCIDENT PREDICTION MODELS
Accident prediction models represent functions that express relationship between various independent variables (traffic and geometry characteristics of road segments) and dependent variable (the number of accidents).
Review of the relevant literature shows that most of the previous traffic accident prediction models focused on the total number of traffic accidents and traffic accidents with injuries (fatal + injury). For this study however, the number of traffic accidents with fatalities and injuries was relatively small. Thus, the authors decided to use the total number of traffic accidents per kilometre per year as a dependent variable. The following variables were selected to be independent variables: AADT, the percentage of commercial vehicles, radius of horizontal curvature, longitudinal grade, and pavement width ( Table 1).
The authors firstly (Section 4.1) performed univariate regression analysis where individual impacts of various independent variables on the dependent are observed. Second, multivariate analysis was conducted to investigate simultaneous impacts of multiple variables on the dependent variable (Section 4.2).

Univariate regression analysis
In order to determine the individual impact of independent variables on the dependent variables, the univariate regression analysis was conducted using Table Curve 2D software v5.01. Various model forms were identified using regression analysis, with selecting only those that were statistically best fitted.
The exponential model of the dependence of the total number of accidents on AADT was statistically the best-fitted model. This model established that the total number of accidents per kilometre per year increased with increasing AADT, as can be seen in Equation 1 and

Multivariate regression analysis
Multivariate regression analysis can have n variables, but usually researchers limit the number of variables according to their impact on dependent variable. Considering that the results of univariate regression analysis showed significant impact of AADT, radius of horizontal curvature, and carriageway width on the number of traffic accidents, these variables were included in the multivariate regression analysis. The software Table Curve 3D v4.01. was used to execute the multivariate regression analysis.
The results of the first model, which determines the dependence of the number of traffic accidents on AADT and the radius of horizontal curvature, show that the increase of AADT and decrease of the radius increase the number of traffic accidents ( Figure 6). The equation (R 2 =0.608) of the first model, which includes AADT and radius, has the following form: N=-0.052+1.839 · е -0.8 · AADT 2 +814,116.408⁄R 2 (6)   The results of the model that determines the dependence of the total number of traffic accidents based on AADT and carriageway width show that with the increase of AADT and decrease of pavement width, the total number of traffic accidents (Figure 7) increases. The equation (R 2 =0.570) of the second model which includes AADT and carriageway width is of the following form: . .

The influence of average annual daily traffic on the number of traffic accidents
The results of this study have shown that the increase of AADT causes the rise of the total number of traffic accidents. This is a logical finding, often found in the previous research [12,15,[29][30][31], since higher traffic volume increases the chance of traffic accident. However, one has to note that the relationship between the number of traffic accidents and AADT is not strictly linear, but exhibits an exponential relationship. This finding is similar to three separate studies [13,21,31], which concluded that exponents for AADT were statistically significantly smaller than one, indicating a non-linear relationship between the number of traffic accidents and AADT.

The influence of percentage of commercial vehicles in total traffic flow on the number of traffic accidents
The results of this paper have shown that the total number of traffic accidents decreases with the rise of the percentage of commercial vehicles in the total traffic flow. This finding is in accordance with the results given by Hiselius [32]. However, one should bear in mind that the model defines only 16% of the variability of the dependent variable, which prevents reliable conclusions. Similar to the research in Sweden [32], a small sample could have influenced conclusions of this study. In general, one should note that the effect of commercial vehicles percentage on the number and severity of traffic accidents is not clearly understood.

The influence of the radius of horizontal curvature on the number of traffic accidents
The results of this paper show that the rate of traffic accidents decreases with the increase of the radius of horizontal curve. This decrease is most significant until the radius value is around 500 m. This finding is also in accordance with several previous research efforts. For example, Milton & Mannering [6] found that the rates of traffic accidents on curves were between 1.5 and 4 times higher than on tangents. In addition, several studies concluded that the rate of traffic accidents decreased until the radius size between 400 and 500 m [33][34][35]. Other studies suggest even smaller value for the critical radius influencing the growth of traffic accident rate, with values between 350 and 400 m [36][37].

The influence of roadway gradient on the number of traffic accidents
The results of this paper have shown that the increase of roadway gradient reduces the total number of traffic accidents. This conclusion is in accordance with some of the previous studies [18], while it also contradicts the findings in other previous studies [15,36,38]. One of these studies [15] concluded that the number of traffic accidents rises at the rate of 1.6% with each percentage of the gradient increase. Another study conducted by Iyinam [36] proved that the rate of traffic accidents rose abruptly for the gradients larger than 6.5%. In addition, analysis of rural two-lane and two-way roads in Utah, Miaou [38] found that the number of accidents was 10% and 16% higher on the roadway sections with gradients from 3% to 6% and higher than 6%, respectively, in comparison to gradients smaller or equal to 3%. Contradictory results originate from Mayora et al. [18], which found that the rate of traffic accidents with injuries decreased with the increase of gradient. This tells us that roadway gradient can also be an influencing factor, but its influence is probably co-dependent on other roadway or traffic characteristics. In relation to the research presented here, one can note that roadway gradient model can explain only 7% of the variability of the dependent variable. However, an important obstacle for in-depth understanding of the influence from roadway gradient is a small sample size, which is a similar issue as in the previous research.

The influence of carriageway width on the number of traffic accidents
The results obtained in this research have shown that the increase of pavement width reduces the total numbers of traffic accidents, which is in accordance with the former research [12,20,29,33,35]. This finding has its justification in the driver behaviour theory, where increasing the lane width enables safer car-following behaviour, by providing extra space for driver's error. In this research, after the value of 7 m carriageway width one can observe a significant change in the rate of traffic accident decrease. This value is similar to a study conducted by Iyinam [36] that determined that the road width smaller than 6.5m was critical. In addition, this is an important point for future engineering activities, since previous research has also confirmed that by increasing the roadway width by one to four metres can reduce the number of traffic accidents on rural roads [12,33].

Multivariate relationships
In addition to univariate relationships, multivariate models developed explain further co-dependence of variables and their influence on the number of traffic accidents. The results of the first multivariate model show that combined increase of AADT and decrease of curve radius increase the total number of traffic accidents per kilometre per year. In addition, the second multivariate model indicates that the total number of traffic accidents per kilometre per year increases with the increase of AADT and decrease of carriageway width. Common for both models is a specific interaction between traffic (AADT) and roadway characteristics (curve radius and carriageway width), where the increase in traffic volume, combined with reduced geometric values, results in an increase of traffic accidents. Consequently, these two models enable simultaneous consideration of traffic and roadway characteristics in engineering and planning activities.
Moreover, the models emphasize even more the importance of critical roadway characteristics, especially in combination with traffic volume.

Limitations of the paper
Considering that the data for this paper were retrieved from a limited database (the only available data were those for the two-lane state road IA and IB rank (road marks IA-2 and IB-15)), the sample size consisted of 344 sections, which were in the total length of 213.8 km. In addition, the spatial locations of the accidents were somewhat inaccurate as there were no recorded GPS coordinates but the location was estimated based on the existing milepost markers. The sample included small data variation, which means that most of the data points were retrieved from the segments with ideal roadway characteristics. In general, the ideal section characteristics are those that previous studies have found to have low influence on the occurrence of traffic accidents. For example, these characteristics include curve radiuses greater than 450 m, the width of traffic lane greater than 3.5 m, the gradient smaller than 3%, etc. At the same time, these are the most frequently represented sections. Also, numerous other factors, such as number of access points, road environment, pavement conditions, whose influence on the occurrence of accidents had been determined in the past research, were not available and not possible to examine for the purpose of this paper.

CONCLUSIONS AND RECOMMENDATIONS
This paper studied the influence of road and traffic variables on the occurrence of traffic accidents. The regression analysis was used to examine the impact of the variables, including AADT, the percentage of commercial vehicles, the radius of horizontal curvature, the gradient and the pavement width. The analysis confirmed the base hypothesis that there is a functional effect of road and traffic characteristics on the occurrence of traffic accidents. These dependences are investigated, and consequently accident prediction models are defined.
The univariate regression analysis determined specific dependence models and the comparative analysis used in the discussion of results showed that these models corresponded to the results obtained in the previous accident modelling research.
Considering that satisfactory results were obtained for AADT, radius, and pavement width (from the point of view of the statistical significance of correlation coefficient) these variables were further considered in the multivariate regression analysis. Consequently, two models of the multivariate regression analysis were developed. One of the models included AADT and the radius of horizontal curvature and the other included AADT and pavement width. The model which determines the dependence of the total number of traffic accidents on AADT and the radius of horizontal curvature, show that the increase of AADT and decrease of the radius increase the total number of traffic accidents. The model that determines the dependence of the total number of traffic accidents on AADT and pavement width shows that the increase of AADT and decrease of pavement width increases the total number of traffic accidents. Both multivariate regression models support findings from the previous research.
Findings of this research have several implications. Therefore, there are numerous applications of traffic accident prediction models. The traffic accident prediction models developed in this paper can be used in road and traffic engineering in Serbia, and in the area of South East Europe. The potential areas of use include cost-benefit analyses, traffic safety analyses, traffic impact studies, ecological analyses, multi-criteria analyses, etc.
One should note that the obtained results are the first attempt to study this issue in Serbia, and in the South East Europe after 1970s. Consequently, one of the issues that this initial effort faced was data unavailability. Findings were limited due to the insufficient sample, accompanied with small data variation, and unavailability of detailed accident and roadway data. However, a significant lesson learned from this research is the need for an extension of the database to the rest of the Serbian road network, as well as the inclusion of other traffic and roadway parameters in an integrated future database. This integrated and expanded database should ensure greater model reliability and improve further engineering and planning activities. Finally, further research in traffic accident prediction modelling based on an improved and expanded database is needed for further validation of findings reached in this paper.