A NOVEL APPROACH IN EVALUATING THE IMPACT OF VEHICLE AGE ON ROAD SAFETY

This study examines the correlation between road accident casualties and the age of the vehicle, assuming that the age of vehicles and the improvements in their safety designs are related. The study evaluates the impact of the interrelationship between road segment characteristics and road accident type on vehicle age at the time of the accident (AVC). To analyse the nested relationship between these variables, a multinomial logistic regression (MML) model has been developed. The result of the analysis also duly finds that vehicle age has an emphatic role in the occurrence of accidents.


INTRODUCTION AND LITERATURE REVIEW
Every year, road accidents cause large economic and social losses to societies and individuals [1]. Recently, identifying the causes of road accidents has become a crucial objective for road safety experts. Generally, road accidents are not randomly located events, but they are rather related to various circumstances like humans, vehicles, and other geometric and environmental properties [2].
Identifying factors related to accidents can significantly contribute to achieving safety objectives. Relevant studies have reported a variety of different factors contributed to road accidents [3,4]. Methods that explain accident occurrence rarely consider vehicle age dependency as an important factor. The related studies have found a direct relationship between the age of the vehicle and the increased likelihood or severity of the accident. Blows et al. [5] have suggested that the increased proportion of older vehicles increases the risk for their occupants to be injured during accidents. The author has also suggested reducing the number of older vehicles on the roads, especially vehicles older than 15 years, or improving their safety aspects. In the current years, the safety standards of the automotive sector can be characterized by a permanent development. These safety improvements include the ability of the vehicle to stop, the stability control of the vehicle, the passive safety systems (e.g. frontal and side airbags, roof crush strength), the ability of the vehicle to perceive its environment (e.g. frontal and backward sensors), and many other improvements related to driver assistance systems. In [6] it has been emphasized that the remarkable safety improvements of the recently purchasable cars can be expected to have a significant influence on safety. In the United States, improvements in safety-related vehicle technologies are reported to reduce road accidents from 30 percent in the model year 2000 car to 25 percent in the model year 2008 in case of new cars [7]. The report adds that the improvements of the 2008 vehicle fleet have been estimated to have saved about 2,000 lives during the investigated evaluation period.
Most of these studies have been limited and they included the use of simple models, which are rather applicable to indicate the existence of some relationship between the vehicle age and the accident data, then accurately describe the characteristics of their relationship. Some of the researchers [8] have applied a logistic regression model to show the improvements in vehicle safety related to vehicle age and model year. However, the field of association analysis investigating the influence of the age of a vehicle at the time of the crash (AVC) on the type of the accident and the surrounding environment is still under-researched. From a methodological viewpoint, most of the developed accident severity which are categorized in this research as follows: Type 1 (skidding, skidding and overturning, skidding & hitting objects), Type 2 (hitting objects in the carriageway), and Type 3 (run-off-road).
In order to find the indirect effect of the environmental conditions on AVC the roads have been divided into homogeneous segments. The segmentation process included factors related to traffic volumes, speed limits, roadside hazards, and the presence of horizontal curves, as recommended by the Highway Safety Manual specifications [20] in order to obtain homogeneous road segments with distinguished environmental characteristics. The segmentation process has resulted in 576 segments with an average length of 5.3 kilometres, and average accident content that equals 12.3 accidents per segment per year.

METHODOLOGY
Vehicle age, as a target variable, is conditioned on the fact that an accident has happened. The outcomes of the AVC have been divided equidistantly into five age groups (i.e. AVC1, AVC2, AVC3, AVC4, and AVC5). Accident frequencies between these randomly divided groups may vary for the given time period. Multinomial logistic regression is an appropriate analytical approach in such situations when there is an outcome with more than two events that are not ordered, or where the ordered nature is not so clear. The analysis of the data showed a hidden influence of a hierarchical relationship between explanatory variables. A two-level approach has been used to examine the nested relationship. The description of the two-level model is given in the following.

Level-one modelling
Assuming the data have a multinomial distribution, the total outcome categories can be denoted as C, and each individual category can be indexed by c. The probability of being in c-th category P(Y=c) is π c (c=1,2,..,C). The probability of the outcome of each of the other category π c is compared against the reference category π C using the expected cumulative models have been limited to examining the effect of the roadway and environmental feature on a specific accident type. These models usually use a wide range of logistic regression models [9,10], assuming a single relationship between all of the explanatory variables and the target variable. In contrast, Haghighi and others have explored the nested relationship between individual crash characteristics and environmental and roadway features [11]. He has applied a multilevel ordinal logistic regression to address the hierarchical structure of accident data and its impact on accident severity outcome.
This research applies a multilevel multinomial logistic regression (MML) model developing further the methodology of Haghighi [11], to investigate the multilevel relationship between accident types (ex. skidding, overturning, etc.) and roadway geometric and environmental conditions (ex. roadside hazard, horizontal curve, traffic volume, speed limit, etc.) variables and its impact on AVC, assuming that the age of the vehicle strongly affects the level of in-built safety technology of the vehicle and the adapted improvements in its safety designs [12].

DATA
The used dataset has been obtained from the UK Department of Transport website and has included two types of data. The first dataset ( [13][14][15]) has included accident information (i.e. location, date, time, number of vehicles involved, road characteristics, presence of junctions, weather conditions), crashed vehicle characteristics (i.e. type, model, age, engine capacity, type of accident), [16] and description of casualties (i.e. severity, age, gender, type). The second dataset ([17-19]) has included traffic volume characteristics and speed limits for different road segments. Additional roadway characteristics (i.e. horizontal curves, roadside hazard, lane characteristics) have been identified with the help of the Google Earth and ArcGIS software. Based on the available information, homogeneous road segments have been generated during the first step of the analysis.
This study has concentrated on examining the impact of the nested relationship between various environmental characteristics (i.e. roadway geometric characteristics, traffic volume, and speed limits) and accident types and their impact on AVC as a target variable applying the data of the UK motorway roads between the years 2014-2016. The data includes three variables describing accident types, group j. Assuming there are no level-two predictors, the general form of the level-two model will be as follows.
At level-two, more level-one intercepts (β 0j(c) ) can be modelled as a function of level-two variance (u qj ) and intercept (γ 0 ) for different group j. The general MML model is formed by combining both levels (Equations 1 and 2) in a single formula (Equation 3).

Intra-class correlation
Intra-class correlation (ICC) is a method to analyse whether groups are significantly different from each other. In other words, it identifies the applicability of the multilevel model. According to [22], even a small ICC value can have a substantial effect on the model. ICC is calculated based on the estimated values of level-two variance (u). Equation 4 presents the ICC that describes the proportion of variance between groups.
where the variance of the logistic distribution is given as (π 2 )/3 or 3.29.

RESULTS AND DISCUSSION
A two-level continuous mixed regression (CMM) model has been developed to examine the safety effect of roadway environmental features on the age of the vehicle at the time of the crash (AVC). The model has been applied several times in order to select the best explanatory variables and eventually excluding log link function η c . The general single-level model to link the expected values of the outcome to the predicted values of η c can be written as follows: where: η c is the log odd variable of being in a particular category (c) versus the reference category C; β 0 is the intercept; X q is the applied prediction variable of predictor q; β q is the coefficient of predictor q.
The odd ratio can result from exponentiating the log odd coefficients (exp(η c )). When the odd ratio is less than 1, the function value of the outcome becomes less than 0, while in case of an odd ratio above 1, the function value becomes positive.

Level-two modelling (Developing multinomial modelling)
In the developed model, the multilevel model treats the individual vehicle crashes as a member of a certain group. Each group corresponds to a specific road segment, including different environmental and roadway geometric characteristics. Since the individual crashes of a given group are likely to share similar environmental characteristics, their causing factors are assumed to be similar, more likely to each other than to the causing factors of other individuals from another group. Accordingly, the data of the analysed individual crashes cannot be considered independently of other crashes from the same group, whose aspect would violate the applicability of a single-model.
In the case of multilevel analysis, the relationship between different groups of vehicles involved in accidents is modelled by identifying a hierarchical data system that can describe latent relationships of the clustered dataset. This means that the outcome of the model can represent a nested relationship between the lower-level properties (level-one) of individual accidents and higher-level group properties (level-two) related to the environment features, as presented in Table 1.
Contrary to the fixed-effect of the slope and intercept of the single-model, the variation of the multilevel model is random. The variation within a single cluster group j is measured by the intercept (β 0j ). The value of the intercept is defined as the predicted response either in the case of zero values of the predictors or at their mean value [21].
The MML model is used to predict the odds of belonging to an outcome category c versus the reference category C in case of every individual i in average accidents (in case of "off-road environment", the effect of accident type on age is 0.7). If the type of accident is hitting objects, then it is expected to include a newer vehicle (in case of "hitting objects", the effect of accident type on age is -0.23). It needs to be mentioned that this is not significant at 95% confidence level (as the Sig. is 0.193 in Table 3). This finding is consistent with previous studies. Boodlal and colleagues have suggested that crash occurrence and crash type can be partly explained by road geometric features, together with vehicle characteristics [23]. Haghighi, Liu, Zhang, & Porter have proved that geometric features can be applied to predict typical crash types [11]. However, this outcome has to be interpreted very carefully, since the risk of a vehicle characterized by a certain age group being involved in an accident is also strongly influenced by the penetration of the given age group compared to the whole vehicle fleet, and the traffic performance characterizing the specific age group [24]. According to the penetration-based comparison the oldest and the youngest age groups are significantly under-represented in case of the investigated accident types [25,26]. The relative under-representation of newer cars in case of the analysed crash types seems to be rationally acceptable, since the new safety technologies can strongly support the improvement of road safety [27][28][29].
Another interesting finding is the relationship between the driver's age and the vehicle age at the time of accident. The inverse relationship in Table 3 (-0.11) indicates that older drivers are more likely to be involved in new vehicles' accidents while young drivers are more likely to be involved in old vehicles' accidents. It looks like the age compatibility between the car and its driver is more favourable to avoid accidents. With respect to vehicle type, newer trucks and buses are more likely to be included in accidents in comparison with cars. Regarding buses, it has to be mentioned that it is not statistically significant at a 95% confidence level (as Sig.=0.220 in Table 3). This can be strongly influenced by the commercial nature of these categories since commercially operated vehicles are usually younger than the individually owned cars. Table 3 also estimated (with overall significant value) the AVC for 44 different car makers but with different models. Among those, only 16 vehicles of the same makers are statistically significant for the 95% confidence level. Figure 1 gives an indication of insignificant variables. Tables 2 and 3 present the resulting random and fixed effect coefficients of the model. The random effect coefficients represent the level-two variance components of the intercepts. As mentioned before, the inter-class correlation (ICC) describes the proportion of variance between groups. In other words, it describes the characteristics of the nested relationship between the level-two explanatory variables (i.e. environmental conditions) and the target variable (i.e. AVC). According to Table 2, the residual parameter describes the variance due to differences among individuals (i.e. AVC for each vehicle) within their respective units. In accordance with the table, there is a significant variance within the groups (Wald Z=41.33, p<0.001). Similarly, the intercept parameter indicates that the intercepts vary significantly across the different clusters (Wald Z=1.03, p=0.031). The Wald Z test provides a Z statistic summarizing the ratio of the estimate to its standard error. In other words, the developed CMM model can explain the hierarchical relation between the hierarchical predictors and AVC.
The CMM has been built based on four predictors (i.e. driver age, accident type, vehicle type, and car-makers), all of which are statistically significant. The estimated values of the predictors are presented in Table 3. Table 3 provides the estimates of the fixed-effect coefficients. First, it can be seen that the intercept (β 0 =10.37) represents the average AVC when all the predictor values are zero or at their reference value. In the case of Accident type, two of which were found to be significantly related to the AVC. In the case of the first accident type (skidding, and skidding and overturning), accident risk is proportional to AVC with positive estimated values (0.46 and 0.35, respectively). In other words, accidents of older vehicles are more likely to occur due to skidding, and skidding and overturning. Similarly, in case of the second accident type, if an accident occurred in an off-road environment, the age of vehicle is expected to be older than in the case of

Figure 1 -The estimated values by car-makers (AVC model)
target variable, considering it outstandingly relevant from the viewpoint of the accidents, since vehicle age actually represents the difference in the integrated safety systems in case of different vehicle generations. Accordingly, multilevel analysis in our case targets the investigation of the relationship between the different groups of crashed vehicles by identifying a hierarchical data structure that can contribute to using the advantages of the clustered dataset. This multilevel approach is represented by a hierarchical structure aimed to reveal the influence of group-level characteristics of data. A multinomial logistic regression model has been applied to compare the different categories of AVC outcomes using the cumulative odd log link function. Before the summary of the final results, it would be good to introduce the limitation of the study briefly. It is important to emphasize that in most countries, a comprehensive statistical dataset is not available, which would be capable to describe the traffic performance of the vehicle types, categories, and age groups. It means that we cannot differentiate the categories based on the driven distances, which would be crucial to evaluate the accident risks and probabilities related to a specific category (e.g. age groups or vehicle types). In this case, if we characterize the effect of vehicle age on traffic safety without reliable performance data, a relevant part of the methodology has to be based on assumptions and estimations. On the other hand, we also need to mention that a database on the level of automation (especially focusing on ADAS) related to specific vehicle categories and age groups would strongly support the analysis of the effect of automation on traffic safety. However, the model parameters of the current investigation have been estimated using road accident data of the UK motorway network in the period of 2014-2016. The first level of the model includes parameters related to accident characteristics, while the second level describes the environment and geometric features of road segments where accidents occurred. The result has proven a significant influence of the AVC by applying a hierarchical structure of the considered explanatory variables. The values of the variance component have supported the assumption that the intercepts vary across the different AVC groups describing the correlation between the environmental features and different AVC groups.
Finally, I would like to introduce the applicability of the results to other countries briefly. In this case, it seems to be reasonable to handle the issues of the the distribution of the model estimated value with respect to car-makers. The first to be noted from Figure 1 is that all of the estimated values are negative, indicating an inverse relationship between the AVC and car-maker. In other words, newer cars have a higher probability of an accident in case all the analysed car-makers.
In more detail, among the car-makers of Table 3, the new models of Ford and Citroen have the least predicted accident probability in comparison with their old models (-1.78, -1.99, respectively). On the contrary, the accident probability for newer models from Isuzu and Mercedes-Benz were higher than the older models that gave them higher negative predicted values (-5.95, -5.08, respectively) in the model. These probability values are influenced on the one hand by the changes in the penetration and on the other hand by the safety improvement efforts of the given producer. For example, if a producer's vehicles are more attractive in the present than in the past, it causes an increased number of vehicles on the road from the certain producer, and thus, it can increase accident probability. Besides if a producer efficiently allocates higher resources on the safety field, it can lead to safer cars and thus smaller recent accident probability values compared to the past. Due to the introduced factors and the relatively low number of data used in the analysis for the different car-makers the outcome has to be handled carefully.
Furthermore, the variation in the odd ratio of the crash types between different AVC groups has proved the nested relationship between crash types and the environmental conditions related to different road segments, which also support the applicability of the multilevel approach. According to the results of the analysis, older vehicles are more likely to be involved in traditional types of accidents. On the other hand, considering the penetration-based comparison, the oldest and the youngest age groups are significantly under-represented in the case of the investigated accident types. For instance, skidding type of accident has a higher probability of including casualties for older vehicles compared to newer vehicles.

CONCLUSION
In this paper, a multilevel multinomial logistic regression has been applied to analyse the nested relationship between crash types, roadway geometric, environmental conditions, and their impact on the change in the age of vehicles at the time of the accident (AVC). AVC has been investigated as the methodology and the numeric results separately. Regarding the methodology, we can conclude that the applied model can easily be adapted to other countries, to other national datasets. Accordingly, if a certain country has got a detailed road accident database, it can perform the introduced evaluation and can effectively investigate the relationship between AVC and automotive manufacturer companies in the case of a specific national vehicle fleet with the developed model framework. On the other hand, the introduced numeric results can only be used to represent the safety situation in the UK since the penetration of vehicle types, age groups, and manufacturers can vary from country to country.
Furthermore, based on the introduced limitations of the model, in the future, I would like to involve the factor of traffic performance in the research methodology. With this, the categories based on the driven distances can be differentiated, which would be an outstandingly important contribution to the evaluation of accident risks and probabilities related to a specific category (e.g. age groups or vehicle types). Besides, I would also like to develop the database focusing on the level of automation (especially focusing on ADAS) related to specific vehicle categories and age groups to support the analysis of their effect on traffic safety.