NEW APPROACH TO ESTIMATING THE SATURATION FLOW RATE OF A SHARED LANE WITH PERMITTED LEFT TURNS

The estimation of the saturation flow rate is of utmost importance when defining the signal plan at intersections. Because of the numerous influential factors, the values of which are hard to be determined, the subject problem is to be regarded as an extremely complex one. This research deals with the estimation of a saturation flow rate of a shared lane with permitted left turns. The suggested algorithm is based on the application of the artificial neural networks where the data for training are received by simulation. The results obtained by the neural networks are compared with multiple linear regression and the known HCM 2010 approach for determining the saturated flow of a shared lane. The testing data have shown that the approach based on the artificial neural networks foresaw statistically significantly better values than the ones obtained by multiple linear regression, with an error of 27 veh/h against 49 veh/h. The HCM 2010 approach is significantly worse than the two others included in this research. The ways of the future development of the suggested method could include additional factors, such as the grade of the traffic lane, the proximity of the bus stops, and others.


INTRODUCTION
The saturated flow rate is seen as the maximum number of vehicles that the traffic lane can serve when assigned the green time for an hour. When determining the signal plans at the intersections, the capacity of the traffic lane directly depends on the value of the saturated flow rate. A wrong estimate of the saturation flow rate of the lane can lead to bad signal plans, which consequently leads to unnecessary additional vehicles delay and longer queues when approaching the intersections. All of this results in a drop in the level of service (LOS).
The estimation of the lane saturation flow rate, especially when it comes to a Shared Lane with Permitted Left Turns (SLPLT), presents a complex problem. The saturation flow rate can be measured in the field, by recording an average headway, i.e. the average time between two consecutive vehicles departing from a lane [1]. These kinds of measurements are not easy to perform for SLPLT because the average headway has significant variations depending on the factors influencing the left turns (the input variables into the model suggested in this paper).
Due to the importance and complexity of the problem, a lot of authors have considered different approaches and developed different models for the estimation of the saturation flow of SLPLT. The estimation of the saturation flow on the example of more traffic lanes, based on a regressive analysis, was given in paper [2]. In paper [3] the software for controlling the traffic in real-time "PRODYN" was used for the purpose of estimating the saturation flow rate of SLPLT. An author in [4] suggests an estimation model of SLPLT, based on the new temporary method of machine learning in the estimation of the saturated flow of SLPLT were not examined.
In this paper, the author deals with the application of Multiple Linear Regression (MLR) and Artificial Neural Networks (ANN) on the problem of the saturated flow SPLPT estimation. For different values of the pre-defined influential factors, the values of the saturated flow of SLPLT were determined using the simulation approach. These data were then used for the process of training in the stated methods. The results obtained by MLR and ANN were compared with those provided by the commonly known HCM 2010 approach. As far as the author is concerned, this kind of approach has not been used for the consideration of the subject problem so far.
After the introductory considerations, the second Section is dedicated to the setting and description of the problem in question. In the third Section, all the models for the SLPLT estimation are explained. The results which were obtained and the discussion about them is the content of Section IV. The last, fifth Section is dedicated to conclusive considerations and the directions of future research.

PROBLEM DESCRIPTION
In this paper, the problem of the estimation of the saturation flow rate of a Shared Lane with Permitted Left Turns (SLPLT) is being considered. It is not often easy to determine the saturation flow rate of a shared lane because of numerous factors influencing that value. This task is additionally complicated when one traffic lane serves the flows of different directions, as it is the case with SLPLT.
The drivers look for an acceptable gap in the opposing flow to perform the movement of left turns. The block of a traffic lane appears during the period when the vehicles cannot find the gap in the opposing flow and the saturated flow is reduced. During the left-turn movements, it often happens that the vehicles are served together with the pedestrians, which additionally influences the saturated flow. Besides, commercial vehicles and public transport vehicles influence the speed of the traffic flow and the value of the saturated flow rate is thus reduced. Accordingly, this paper will consider the following influential factors on the saturated flow of SLPLT, which at the same time represent the input variables for the estimation models: X 1 -proportion of left turns in the shared lane (%); analytical dependencies of the saturation flow and influential factors. The results show the advantages of this model compared to the HCM 2010 method. Authors in [5] developed a hybrid model for SLPLT estimation based on simulation. The suggested model showed its advantages over the HCM 2010 model. Authors in [6] compare three methods for the estimation of the saturation flow rate of shared lanes: Highway Capacity Manual 2010 (HCM2010), Australian Road Research Board (ARRB) and the Canadian methods. The results show that the ARRB method gives the values which are the closest to those measured in the field. Authors in [7] dealt with the impact of motorcycle drivers on the saturation flow rate of shared lanes. The results show a considerable dependence and recommendation that this group of drivers should be taken into account when designing and estimating the signal plans at intersections.
Authors in [8] dealt with the dependence of the saturation flow rate in the function of socio-demographic factors such as area population, average age, average income and average trip distance. In paper [9] an algorithm was developed for the estimation of the saturation flow rate in the case of a longer time cycle. The estimation of the saturation flow at intersections with extremely heterogeneous traffic flow can be found in paper [10]. Lane Utilization Analysis of SLPLT was shown in paper [11]. Authors in [12] dealt with the problem of estimating the saturation flow for the shared right-turn lanes.
On the example of the intersections in China, the authors concluded that the HCM 2010 method had underestimated the influence of pedestrians on the saturated flow. The application of video cameras and detectors for the estimation of the saturated flow of shared lanes can be found in papers [13,14]. The idea of implementing the left-turn waiting areas, for the purpose of increasing the intersection capacity can be found in paper [15]. A model for determining the saturated flow of exit lanes for left-turn intersections can be seen in paper [16]. The saturation flow rate analysis under the automated vehicle environment can be found in paper [17].
Since we are dealing with a complex problem, although numerous works exist, there is space for further research. Previous studies did not deal considerably with the possibilities of forecasting the saturated flow based on the values that were already measured. Besides, all the possibilities of the con-that the simulation could take into account different patterns of vehicle arrivals. The final value for each data (s SIM ) represents the average value of five simulation runs and is shown in the Appendix. During the simulation, the following parameters were used: speed of the cars in the flow: 48-58 km/h; speed of the commercial vehicles in the flow: 40-45 km/h; left turning speed: 30-35 km/h; warm-up time: 100 s. These values, obtained by simulation, will be marked by s SIM . Phase 2: Using n input/output pairs of data for the calculation models of the saturated flow. The first model is based on the HCM 2010 procedure and the generated values will be marked by s HCM . The second model is based on the multiple regression analysis and the generated values will be marked by s MLR , while the third model is based on the artificial neural networks and the generated values will be marked by s ANN . Phase 3: To use n of the input data for testing the efficiency of all three suggested models. The results obtained by simulation are considered to be meritory and the errors of the suggested models will be calculated accordingly.
It is not easy to generalize different measurements from the field to adopt them as meritory at all the intersections. One general model obtained by simulation, under the value of the simulation parameters given, can be considered precise enough to be adopted as meritory for the requirements of this paper.

CALCULATION MODELS
In this paper, three models for the calculation of the saturation flow of SLPLT will be used. The first one is the analytical approach, based on HCM 2010 calculations. The second approach assumes the multiple regression analysis based on the input/ output pairs of data generated by simulation. The third is the approach which implies the use of artificial neural networks as a universal approximator. X 2 -opposing demands (pcu/h); X 3 -pedestrian conflict demands (ped/h); X 4 -proportion of heavy vehicles in the shared lane (%).
The saturated flow can be influenced by other factors, such as the lane width, the proximity of public transport stations, the grade of the traffic lane, etc. Let us assume that these and the other factors have no influence on the saturated flow.
The procedure for the prediction of the saturated flow can be divided into three phases ( Figure 1). The first phase is generating the input parameters and obtaining the results by simulation. The second phase represents the application of different analytical models for the estimation of the saturation flow value. The third phase refers to the evaluation of the results obtained by all methods. The evaluation criteria are the deviation of analytically obtained saturation flow values from those obtained by simulation.  The explanations for each of the phases in Figure 1 are as follows: Phase 1: To generate n pairs of the input data randomly. The range in which the input variables are to be found is shown in Table 1. The simulation was performed in the software package Synchro 7. For each data, five simulations were performed with a different number of seeds. The seeds took the following values: 10, 20, 30, 40 and 50. A different number of seeds is needed so  Pedestrian adjustment factor for left-turn movements is calculated as follows [1]: where q ped are the pedestrian demands [ped/h].

Multiple linear regression
Multiple Linear Regression (MLR), known as a machine learning method, is used in many fields of science for data forecasting. Some examples of MLR application for saturated flows can be found in [18,19], but, as far as we know, there is no application of the MLR on the saturated flow of SLPLT. The input data for the MLR are represented by a set of independent variables, while the output variable is dependent on the input. The assumption that the output variable depends on the input ones can be described linearly. According to this, the following linear equation is introduced: where b i are the coefficients and a is an intercept. For the optimization of the regression coefficients and intercept, the following model of combinatory optimization is set: Minimize

HCM 2010 approach
The HCM 2010 procedure offers eleven correction factors which reduce the base value of 1,900 pcu/h (passenger car units per hour). According to the adopted criteria for the calculation of the saturated flow, the following formula will be applied: where: P HV -proportion of heavy vehicles for lane demands; E T -passenger-car equivalent (E T usually equals 2). The saturated flow represents a maximum number of vehicles that the traffic lane can serve during 3,600 seconds. To enable the application of the HCM 2010 procedure for the saturation flow, an assumption has been introduced that the green time of the shared lane is g=3,600 s.
The green time should be divided into two periods. During the first period, the shared lane is blocked because the left-turning vehicles cannot filter themselves through the opposing flow. This time is marked by g q . During the second period, g-g q , vehicles manage to find a gap through the opposing flow and the lane is released.
The adjustment factor for the left-turns in the shared lane with a permitted left turn is calculated as follows [1]: where The multiple coefficient of correlation R mul is thus calculated: where SD is the marked standard deviation. Adjusted R (R adj ) is calculated as follows: . R n k R n 1 1 where n is the number of training data, and k is the number of variables. R mul is a correlation factor which shows the percentage of variation explained by a regression model out of the total variation. By increasing the number of variables, R mul will always grow, which can lead to an overfit model. R adj is the factor of correlation which penalizes the adding of new variables without an important share in the improvement of the existing model. When it comes to the measure of the model quality of the multiple linear regression, the correlation factor R adj is relevant.
Attempts were also made with multiple regression, where the input parameters were described by the polynomial of degree two, but no improvement in the R adj value was noticed.

Artificial neural network
The Artificial Neural Network (ANN) is a mathematical algorithm which imitates the neural network in a human brain. ANN can solve complex problems where the task is to copy one set of input data into the output function, so they are also called the universal approximators. Using the data from the past, assigned in advance, ANN learns from them. When given the unknown input data, ANN can perform a prediction of their outputs based on the previous experience obtained by learning. ANN is made of layers, each of which contains a certain number of neurons. The neurons of one layer are connected only with the neurons of the next layer. The architecture of such a neural network, called the multilayer perceptron, suited for the subject problem, is given in Figure 2. 9) minimizes the difference between the saturated flow obtained by the simulation and the saturated flow obtained by MLR for all the n pairs of the training data. Constraint 10 defines the interval for the feasible intercept values. Constraint 11 defines the interval of the values for the feasible coefficients.

Fitness function F (Equation
The initial values of coefficients and the intercept are obtained by data analyses which have been implemented in the software Microsoft Excel. The final solutions of coefficients and the intercept are obtained by a genetic algorithm, implemented in the Matlab software.
Genetic algorithms rest on natural selection principles. Only the individuals who adapted best to the environment and remained strong would have the opportunity to leave their genetic material for future generations. When this kind of logic is copied so that it can be applied for solving the combinatorial optimization problems, the individuals are the key to the solution. In this case, the individual represents the following set [a, b 1 , b 2 , b 3 , b 4 ]. Each individual is assigned a concrete value of the fitness function F. At the beginning of the algorithm, the initial generation of individuals is being generated. Each generation goes through the processes of selection of individuals for the crossover, crossover and mutation, so the next generation of individuals could be produced. This procedure is being generated in iterations until the maximum number of generations, which has previously been defined, is reached. The more detailed explanations of this algorithm and the processes which are being performed on the generations of individuals can be found in the following books [20,21].
The justification of the genetic algorithms application lies in the fact that we are dealing with the problem of difficult combinatory optimization, which has been known in the literature so far. Some of the works which show this are the following ones [22,23,24]. The minimum and the maximum values for a and b j are adopted considering the initial solution so that they can cover sufficient space of feasible solutions. In this very case, the maximum and the minimum value of these coefficients is 3,000 and -100, respectively. The parameters of genetic algorithms which have been used for solving the problem are the following: way of selection: proportional; crossover probability: 30%; mutation probability: 4%; Adjusting the weights w nh and w n , with minimizing some of the criterion functions, present the training process for ANN. The "backpropagation" is one of the most applied learning algorithms for ANN. Starting from the output layer, through the hidden layers, to the input layers, this algorithm calculates the error for each node of the neural network. The error represents a difference between the obtained and the desired value of the output function. The algorithm procedure is being repeated through the epochs until the minimum value of the error is reached. In one epoch, error E can be calculated as follows: where: n -neuron index, n=1, 2,.., N; d p -desired output; o p -obtained output. The evaluation of the error through the epochs is given in Figure 3. The number of epochs is set in advance to the value of 1,000.
There are three types of layers in the neural network in Figure 2: the input, the output and between them, a hidden layer. Mathematically speaking, if h is the index of the neurons in the hidden layers which count up to H, and n is the index of the neurons in the output layer which count up to N, the neural network from Figure 2 can be described as follows [25]: where: s -output from the ANN; O h -output of the h-th hidden node; w nh -weight between the n-th node of the input and h-th node of the hidden layer; w h -weight between the h-th node of the hidden layer and the output layer; X n -inputs to the ANN; f 1 and f 2 -activation functions.
The most frequently used activation function is a sigmoid function:  the three suggested models will be performed on the unknown input parameters. In Table 2, the testing results are shown for each of the suggested methods.
The errors for each of the three suggested models were marked by σ HCM , σ MLR and σ ANN , respectively. The marked errors are calculated in the following way: The smallest error will be the measure for the quality of all three methods. Table 2 also presents an absolute difference (in veh/h) between the evaluated values of the saturated flow and the values obtained by simulation. These values will be marked by diff HCM , diff MLR and diff ANN , according to the names of the suggested methods (shown in Figure 5).
The HCM 2010 method showed the worst results with the error value σ HCM = 311 veh/h. This method was commonly used in practice because of its uniformity and simplicity. With some other lanes, this error would have probably been smaller, which can supply space for further research. The reason for this kind of thinking is that the shared lane with permitted left turns is one of the most difficult lanes for the precise analytical calculation of the saturated flow.
The error value of the two remaining methods, σ MLR and σ ANN , is 49 veh/h and 27 veh/h, respectively, which is considerably smaller compared to the HCM 2010 method. The close value of the error obtained by the MLR and ANN methods conditions the presentation of the evidence that there is a statistically significant difference between them. If this difference exists, it can be concluded that the ANN method is better than the MLR method. The data on which the statistical method will be tested are diff MLR and diff ANN (provided in Table 2 and Figure 5).
A t-test was applied to confirm the statistical difference between MLR and ANN results: The backpropagation algorithm is suggested by the authors in [26], where more details about the calculations bound to the adjustment of weight of neurons, i.e. training of the neural network, can be found.
During the development of the neural network architecture, one of the main tasks has been to determine the number of hidden layers, as well as the number of neurons in them, so that the best possible dependence of input and output data is obtained. In this paper, it has been achieved by a simple method of trials and failures. The author stayed on certain values (Figure 2: two hidden layers with 12 neurons each) when he was satisfied with R parameter (known statistical measures for goodness-of-fit). If we mark the fit lane from Figure 4 by y=ax+n, the a and n parameters take the values 1 and -25, respectively, while R equals 0.9939. More about the architecture and the ANN types, about the new scientific knowledge and achievements, can be found in book [27].

RESULTS AND DISCUSSION
A numerical example on which the suggested methods will be tested represents the last 20 input data which have not been taken into consideration for the multiple linear analysis and the artificial neural network models. In other words, a comparison of By taking the appropriate data about the average value and the standard deviation from Table 2, parameter value: t=-1.766 is obtained. Testing is performed with the 5% risk (t 0.05 =1.729). Since │t│>t 0.05 , it can be concluded that, in this case, the ANN method has obtained statistically better solutions than the MRL method.
In papers [18,19], the MLR performs well in the saturated flow estimation, but the novelty in our paper is that we solve the complex problem of saturated flow of SLPLT. It is shown that the MLR, as a simpler method than ANN, obtains acceptable results in solving the subject problem. where: diff ANN -average value of the first sample;

CONCLUSION
This paper has presented a new approach for the evaluation of a shared lane with permitted left turn, based on simulation and artificial neural networks. The simulation has been used so that the values of the saturated flow could be obtained, depending on the factors of influence defined in advance. The same set of input/output pairs of data for the training of the artificial neural network have been used for obtaining the multiple regression models. Additionally, for the comparison of the results obtained on the test data, the known HCM 2010 approach was taken into consideration.
The error between the applied methods and the simulation approach was the measure of results quality. The ANN method obtained an error of 27 veh/h, the MLR method obtained an error of 49 veh/h, while the HCM 2010 method obtained an error of 311 veh/h.
The results have shown that the HCM 2010 method was not successful in forecasting the saturated flow, which was expected because the subject problem was extremely complex. The reasons for this lie in the combined stochastic influence of the opposing demands, the proportion of left turns and pedestrian demands. The artificial neural networks gave statistically significantly better results in comparison to the ones obtained by multiple linear regression, which was confirmed by the appropriate t statistics. Although we did not test our approach at some of the real intersections, the results show the possibilities in the practical application of the MLR and ANN models for the calculation of the saturated flows.
The suggested approach has shown its application on the test data which are independent of those used for training. It can be concluded that the artificial neural networks have proved to be useful tools for solving a complex problem such as the evaluation of the saturation flow of a shared lane with permitted left turns. The directions of the future research could include the application of the suggested approach on other traffic lanes: the lane with exclusive left turns, the shared lane with right turns, etc. Additionally, some other influential factors could be included in the suggested methodology, from those recommended in the HCM 2010 manual.