SHORT-TERM TRAFFIC FLOW PREDICTION USING ARTIFICIAL INTELLIGENCE WITH PERIODIC CLUSTERING AND ELECTED SET

Forecasting short-term traffic flow using historical data is a difficult goal to achieve due to the randomness of the event. Due to the lack of a solid approach to short-term traffic prediction, the researchers are still working on novel approaches. This study aims to develop an algorithm that dynamically updates the training set of models in order to make more accurate predictions. For this purpose, an algorithm called Periodic Clustering and Prediction (PCP) has been developed for use in short-term traffic forecasting. In this study, PCP was used to improve Artificial Neural Networks (ANN) predictive performance by improving the training set of ANN to predict short-term traffic flow using selected clusters. A large amount of traffic data collected from the US and UK motorways was used to determine the PCP ability to increase the ANN performance. The robustness of the proposed approach was determined by the performance measures used in the literature and the mean prediction errors of PCP were significantly below other approaches. In addition, the studies showed that the percentage errors of PCP predictions decreased in response to increasing traffic flow values. Considering the obtained positive results, this method can be used in real-time traffic control systems and in different areas needed.


INTRODUCTION
Road traffic has become a more difficult event to manage as a result of the growth of cities and the increasing demand for transportation. Therefore, the development of systems that can effectively manage this complex event has become an extremely important issue today. Attempts are made to direct the strategies applied in traffic management and control systems to the traffic effectively. However, reliable predictions of traffic variables are required for these systems to work effectively. These variables include traffic flow, travel time, speed, intensity, occupancy, etc. As a result of reliable short-term forecasts the traffic flows can be controlled dynamically, consistent strategies for emergencies can be developed and signal systems can be optimized.
In order to assist the traffic management, the researchers have developed many methods with the motivation of accurately predicting the traffic flow. As a result of these studies, it was observed that the performance of the approaches decreased when the time resolution of traffic flow data increased. For this reason, a fully accepted approach to short-term traffic flow prediction has not been developed yet and the development studies are still underway. Moreover, the time horizon of the methods varies from 0.1 minutes to 1 day [1] and most researchers are developing models using datasets in different time horizons. For this reason, it is difficult to make a reliable comparison between approaches. Therefore, it is important to develop models by using a sufficient set of data to increase the effectiveness of the models. From this point of view, attention was paid to ensure that the datasets used in this study were of sufficient variety and quantity. A dataset containing traffic flow rates, which is first used in [2], and then in [3] was obtained from researchers and used in this study. The datasets used in this study were of varied sizes (3 months to 12 months) and were obtained from freeways and motorways of the United Kingdom and the United States. These datasets are superior in diversity and size compared to the relevant studies in the literature.
Today, many new hybrid models are being developed. The efficacy and stable operating conditions of these models have not been fully proven. However, Artificial Neural Networks (ANNs) and k-means algorithms are stable algorithms that have proven effective in the literature. Therefore, it was decided Parametric approaches, on the other hand, are created by determining the parameters that affect the event and by designing them in predetermined forms. These models generally require fewer data than non-parametric models [6]. When the studies forecasting traffic flow rates are examined, it is seen that the traffic flow values, which regard the event as a time series problem, are estimated from the data of the London ring roads using Seasonal Autoregressive Integrated Moving Average (SARIMA) model [17]. Using the same data, the authors predicted the traffic flow rate with 15-minute intervals using SARIMA+ Generalized Autoregressive Conditional Heteroskedasticity (GARCH) models [18]. The researchers developed adaptive SARIMA models in which SA-RIMA coefficients are adapted to the traffic flow [19]. In addition, they criticized that the parameters of other studies are constant. There are other studies using Autoregressive Integrated Moving Average (ARIMA) model methods [2,[20][21][22][23][24]. Apart from these models, filtering [25], Gaussian Maximum Likelihood [26] and State Space Model [27] can be listed as parametric approaches.
"Non-parametric" term in Non-parametric Approach does not mean that no parameter exists in the model. There are parameters in this approach, but initially, their numbers and attributes are variable and cannot be set at the beginning [6]. These models are data-driven and make traffic flow predictions by using sophisticated algorithms. ANNs are the most commonly used algorithms. For instance, the traffic flow and speed were predicted using ANNs. The datasets consist of measurements taken every minute for 10 days. When different training algorithms of ANNs were compared, the researchers stated that the networks trained with Adaptive Levenberg Marquardt algorithm produced the best results [28]. In their model, they only model weekdays with ANNs. In this case, the validity of the model for weekends is questionable. The Spinning Network (SPN) approach has been developed, inspired by human memory. As a dataset, they converted a one-year traffic dataset they received from the Virginia Department of Transportation into 5-minute sections. They compared the developed approach with the 3-NNB named approaches using the ANN Nearest Neighbourhood algorithm and used the median of the results. Eventually, they argued that the SPN approach, which they developed, gave better results [28]. However, using the first eleven months in the training set, using only December as a test group, and using too many hidden neurons to run PCP together with ANNs and k-means algorithms in order to clearly observe the effectiveness of PCP. In data-driven models, the suitability of samples within the training cluster to the problem improves the performance of the model. Thus, it is ensured that the approaches such as ANNs, which contain randomness, can make more accurate predictions more frequently. The main motivation of the developed method is to prepare the training sets to improve the predictive performance of models that can be trained with datasets with consecutive data samples. For this purpose, the Periodic Clustering and Prediction (PCP) has been developed with a novel approach of k-means clustering algorithm and ANNs. The general procedure of PCP is as follows. First, k-means clustering algorithm was used to divide the main training set into subsets. This processing step is performed to identify past traffic flow patterns that have previously resulted in similar traffic flow values. Subsequently, the elected set {e * } is determined from the subsets. The e * is the training set that the model uses to estimate the value of the future traffic flow and contains the most appropriate samples for the estimation. Thus, the model does not only use a certain amount of historical data in an ordered dataset, but it is trained with an appropriate training set selected by the PCP. The ANN predicts the short-term traffic flow using e * . After each estimation process, the training set is renewed for the new traffic situation and the new e * is determined. These steps are repeated after the current data are entered in the PCP.
The studies on short-term traffic flow prediction started in 1979 for the first time [4]. The published results were studied by different authors on different dates [5,6]. When the studies are examined according to the used techniques, four different categorizations are used for short-term traffic forecast: naive, parametric, non-parametric and hybrid [6].
In the naive approach, an attempt is made to predict short-term traffic by simple processing of traffic data. Naive approaches are often used in applications since the need for the computational capacity is low. But generally, the results are unsatisfactory. Examples of this approach are: the use of instantaneous values [7][8][9], forecasts made by the average of the past values [7,[10][11][12], usage of both instantaneous and past values [13,14], and the cluster of days with similar traffic patterns [15,16]. intervals should be the subject of future work. In addition to these, the following can be listed as hybrid models [35][36][37][38][39][40]. The k-means algorithm is used by [41] to group the traffic data. The k-means algorithm was used to divide traffic flow data into categories [42]. The researchers developed a hybrid approach called SpAE-LSTM, which uses temporal and spatial features to predict traffic flows [43]. Another hybrid model was designed using SARIMA and seasonal discrete grey model structure. As a result, the researchers stated that the model short-term traffic forecasts are accurate [44]. The researchers proposed a new hybrid model using ARIMA and Wavelet Neural Network together. With this model, they predicted the traffic flow trend. It was stated that the model was more consistent in both stable and fluctuating conditions than the other two models used in the study [45].
Numerous approaches not mentioned here have been used to estimate the important parameters for traffic management systems such as traffic flow rate, speed, and travel time. These studies have been summarized and elaborated in detail [1,6,46].
To sum up, it is difficult to identify which of these models work better [3]. Since the data time intervals and performance criteria of the dataset used in these studies are different, it is difficult to confirm this determination. When the studies conducted in the shortterm traffic prediction are examined in general, it is understood that ANN and ANN-based approaches give more accurate results [6]. However, the studies are often performed in different model combinations (Hybrid models) or in different versions of ANN-like models. This study expresses an idea about making the dataset more suitable for training a model. This is achieved by dynamically clustering the data in accordance with the current traffic flow. Thus, the model ability to make short-term traffic flow estimation and the consistency of the estimates are increased. This paper begins with the introduction, followed by the review of the literature, in which relevant work is discussed in detail. Then, the PCP method details are presented. After that, the data used in the test phase and the test results are given. In the last section, discussions and general conclusions about the test results are shared. in the ANNs due to the size of the dataset may have affected ANNS prediction performance negatively. Multiple non-parametric methods, linear genetic programming, multilayer perceptron, and fuzzy logic, were used to compare the traffic flow performance predictions [29]. The researchers trained the ANNS network with a 5-day dataset for the year 2012 [30]. In total, 19 input parameters were used. These inputs generally consist of numbers related to vehicle types, variables related to time, speed and the traffic intensity. Different network architectures and transfer functions were tested, and the results were reported to be satisfactory. However, the smallness of the dataset (480 data records), difficulty of accurate measurement from the field and excess amount of input parameters, can be listed as the disadvantages of the study. The k-Nearest Neighbour (k-NN), which is the other approach used, is based on past observations for each prediction and makes predictions with the help of the nearest traffic situation to the current situation. The authors tried to predict the traffic flow rate, speed, and occupancy with k-NN [31]. The researchers who worked with a 3-week, 10-minute traffic datasets reported that k-NN produced less false results than the naive models. The authors predicted the traffic flow with enhanced k-NN [3]. They tested the models with datasets obtained from [2] which consist of a wide variety of regions, and compared the current traffic flow series with other candidate flow series. Then, they compared their method with the four different approaches used in [2] and the Enhanced k-NN approaches, and detected that the proposed model is superior to the other methods. The researchers continue to introduce new artificial intelligence approaches. For instance, a recently developed deep learning approach is an artificial intelligence technique used to describe the graphic model. In another research, the deep learning model was used to estimate the traffic during a football game and on snowy days [32]. The authors concluded that the deep learning model has a low explanatory power. In another study, short-term traffic estimation was made by using network weight matrix method with temporal and spatial inputs [33].
Recently, the studies on short-term traffic forecasting have focused on hybrid approaches. Hybrid methods come up as methods in which more than one approach are used together. The particle swarm optimization was used to optimize ordinary differential equations [34]. It is noteworthy that the time interval in the study is 0.1 s and the positive and negative effects of exceedingly small selected time Then the Elected Set [e * ] is determined and ANN is trained using [e * ]. The flowchart of the developed approach is given in Figure 1.
It would be useful to explain some parameters used before going through the phases of the PCP. Each square shape in Figure 2 shows the 15-minute traffic flow. {Tr t } refers to the training data that ANN will use at time number t. The number of traffic flows of {Tr t } is indicated by m and this number is taken as a constant during the prediction process. In addition, x t indicates the recent traffic flow, while x t+1 indicates the predicted flow.

Time
Actual training data (Tr t ) Predicted flow 15 min

Training data and data pre-processing
PCP is a data-driven approach and needs a certain amount of time-series data in order to be able to perform the training and prediction process. For this reason, first {Tr t } is determined from the raw traffic data after checking if there are no missing data and completing the missing data if they exist. The collection of raw traffic flow data can be done with various detectors, receivers or cameras. Especially in long-term counting operations, short-term malfunctions in counting devices and values contrary to

PERIODIC CLUSTERING AND PREDICTION METHOD (PCP)
In this study, the Periodic Clustering and Prediction (PCP) approach which is the cooperation of k-Means Clustering Algorithm and the ANNs has been developed for short-term traffic flow prediction. There are three main stages in this approach. First, data to be used are determined and outlier detection and smoothing are applied to the data. Second, the dataset is divided into periods and the periods are grouped with k-Means Clustering.  The k-means algorithm is an unsupervised clustering algorithm used in data mining [54,55]. This algorithm divides a dataset into similar k subsets. In the developed approach, the k-means algorithm separates the final vectors into subsets according to their similarities. At the same time, PCP places the first vectors in subsets using the indices of the final vectors.
As a result of the clustering process, in a previously determined cluster amount of subsets  } vector format as follows:   , , , , , , , Thus, what arises from the column averages of each mean value, occurs in this form; where: c ̅ i -vector of mean values of i, (i=1, 2, ..., k); μ -column average value; n -number of columns for [e i ] (n=3 was accepted for this study); C ̅ -vector of the mean value vectors.
After the operations described above, {e * } can be determined. For this, the Euclidean distance (d i ) between {f s } and {c ̅ i } vector is calculated with the general traffic pattern need to be corrected. For this reason; after determination of {Tr t }, the outliers are determined, and the training series is smoothed (see Figure 1). The values in the traffic flow series likely to reduce the modelling performance were determined using Hampel identifiers [47,48]. Hampel identifiers are described in the literature as the most effective and efficient outlier detection algorithm [49,50]. Then, the {Tr t } smoothing process is performed using local regression using weighted linear least squares (loess) [51,52]. The most important advantage of this method is that it does not require any assumption about the dataset [53]. This pre-processing on traffic flow data is only applied to {Tr t }. For this reason, there is no change in the value of x t+1 that is supposed to be predicted. Raw data for one week and the related traffic flow rates after pre-processing are shown in Figure 3.

Composing periods and k-Means Clustering
After the data are identified and pre-processed, {Tr t } is separated into two types of periods.
where s is the number of period vectors. In this study, the same dataset is used by [2] and then by [3]. The researchers used simple screening procedures, i.e. threshold test and hang-on test, to eliminate the erroneous data. They completed the missing data in the dataset using SARIMA (1,01) (0,1,1) 672 model. This dataset is collected from 32 different stations located on freeways and motorways of the United States and the United Kingdom. The datasets have been collected at 15-minute intervals and collection intervals vary from 3 months to 12 months. The missing data in the dataset were completed using the SARIMA (1,01) (0,1,1) 672 model. For further information see [2,24].

Effect of the k, m and Nhn parameters on performance
There are three main parameters that can affect the traffic flow prediction performance. These are the length of the main training series (m), the number of the subsets belonging to the main training series divided by the k-means algorithm (k), and the number of hidden neurons belonging to ANN in the system (Nhn). The effects of these parameters on performance for forecasting the number of vehicles passing within the 15-minute time slot are given in Table 2.
Three different values of m were examined to observe their effects on the performance, which are 7 days (7×24×4=>m=672), 14 days (14×24×4=> m=1,344) and 30 days (30×24×4=>m=2,880). Moreover, for each value of m, different Nhn values shown in Table 2 and different α values, which are the ratios used to determine the k value, were analysed. For example, in a monthly training set with m=2,880, the k value for α=0.02 is 2,880×0.02≈58. Thus, the aim is to make it easier to compare the training sets at varied sizes. MAE, MAPE, and RMSE given in  were used as performance measures.
As a result, the set giving the shortest Euclidean distance is selected and used for the training of ANN. This process is repeated every t+1 and the {e * } is re-determined according to the new traffic flow situation.

Artificial Neural Network and traffic flow prediction
ANN is an artificial intelligence technique developed by being influenced by the biological neurons and their connections. Nowadays, ANNs are used successfully in areas such as prediction, image processing, clustering, etc. They are also used for traffic flow prediction, as discussed in Chapter 2. The ANN used in the PCP approach is trained with the Levenberg-Marquardt Algorithm [56,57]. The number of neurons in the hidden layer (Nhn) has a significant effect on the performance. For this reason, Nhn, giving the best result according to the test results, needs to be determined.
Traffic flow prediction can be done after {e * } is determined as the training set and the ANN is trained.
To compare the result of the prediction with the actual values and to measure the performance of the PCP after the prediction, Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE) and Root Mean Squared Error (RMSE) given in  were taken as performance criteria. where: N -number of observations. These performance measures are frequently used in the literature, and more information about these performance measures can be obtained by the reader from resources [58][59][60].
In order to better understand the performance of the approach at different traffic flow rates, the traffic flow rates were divided into groups as in Table 1.

EXPERIMENTS AND RESULTS
The fact that the dataset is of a size that extends to a year and is collected from different regions, facilitates to model and test these models reliably.

Performance comparisons belonging to traffic groups and intraday hours
In this section, firstly, the results of the performance of PCP in different traffic volumes and hours during the day are discussed. Then, PCP was compared with five different prediction methods used for short-term traffic prediction in previous studies. These methods are: Enhanced k-NN, EXPRW, BATCH, KF and AKF. To sum up these methods briefly, the Enhanced k-NN is a method looking for similar patterns in order to forecast the traffic flow [3]. The other methods, namely EXPRW, BATCH, KF and AKF are used by [2]. In short, EXPW uses seasonal exponential smoothing to capture traffic flow profiles. The BATCH method makes future estimates by generating SARIMA and Autoregressive Conditional Heteroskedasticity (GARCH) models. The KF method was a standard Kalman filter and uses seasonal exponential smoothing method. Finally, AKF, unlike the KF method, consists of an adaptive filter.
Revealing The PCP performance determined for the traffic groups expressed by the ranges of traffic flow values is presented in Figure 4. The traffic groups are explained in Table 1 depending on the traffic volume. Thus, the accuracy of PCP predictions could be discussed due to the change of traffic from low volumes to high volumes. In predictions, low-volume traffic flows, even small scalar errors can reach large percentages. Therefore, in Figure 4, MAPE values of G1 were higher than MAPE values of other traffic groups. In the box diagram, it is possible to see the upper and lower error values of the traffic forecasts of 36 stations from the whiskers of the diagram. For example, in one of the stations, the MAPE value of G1 decreased to 4%. At another station, however, this value increased to 8%. Other station errors changed between these two values. From the box When the error values in Table 2 were examined, it was observed that the best result was the combined length of one month of the main training set with α of 0.02 and Nhn of 5. When the error values of α=0.02 are examined, it is seen that as m increases the error values decrease. Again, looking at the value of m in the one-month length where the best result is produced, the performance is improved by the increase of the α value up to 0.02. However, it is understood from the one-month lines of Table 2 that further increase of alpha value leads to an increase in error values.
It is necessary to determine how the clustering improves the prediction performance. Therefore, re-prediction was performed for k as 1, training set length as 1 month, and the Nhn as 5. As a result, it was calculated that the MAE value of 14.8, which belongs to PCP working with the improved training set, was increased to 17.46 veh/15 min per lane, the MAPE value of 7.6% was increased to 9.73% and the RMSE value of 24.55 was increased to 26.07 veh/15min/ln.
As a result of testing with different variants of k, m, Nhn numbers; it was observed that as the training set gets larger, the error values decrease. However, the results of the growth of m values over one month have remained unclear within this study limits, since the m analyses have been tested for a maximum of one month. It would be appropriate to examine the increase in these values in further studies. The k value of the k-means algorithm is determined by the α ratio. It has been observed that the number of clusters up to 2% of the number of data included in the training set affects the performance in a good way. It is also calculated that because of using k=1, which means the whole training set is used for ANN training without clustering, the MAPE value increases by about 30%. It is understood that the clustering process prior to the training in the light of these results may improve the performance. Moreover, the magnitudes of the periods are kept constant in this study, and the performance of different large periods is expected to have a potential to be examined in further studies. In addition, the performance of different smoothing and outlier methods can be examined in future studies.
Matlab 2016a version was used in the execution of PCP. After starting the PCP with 500 iterations, the average computational time required for one it-RMSE increase with the increase of traffic flow value. The mean MAE and RMSE values for G1 were found to be 13.12 and 16.95 veh/h/ln, respectively. The highest error values were observed in G5 as expected and were determined to be 50.80 and 65.76 veh/h/ln, respectively.
The error values during the day were analysed on an hourly basis and the results are shown in Figure 4. When the average MAPE values are examined, it is seen that the error values decrease to 2% between 1 p.m. and 5 p.m. It is detected that it is around 4%, at the peak traffic hours in the morning. The mean values of MAE and RMSE values were monitored diagram, it is also possible to read the percentage of the stations in which the errors range. The portion between the top and bottom whiskers and the box represents 25% of the data used.
If the results for the groups are investigated with the help of Figure 4, it is understood that the MAPE value falls as we go from G1 to G5. However, RMSE and MAE values seem to increase slowly. When the whiskers of MAPE plot value are examined through Figure 4; it is observed that the difference is about 4% for G1, decreases abruptly in other groups, and decreases even to 1.5% for G5. It is understood that the average MAPE value of these five groups is 3  value of all traffic groups was found to be the lowest 80% and the highest 95%, respectively. These R 2 values show that the predicted traffic flow of the developed system is statistically highly accurate. The mean error values of the developed PCP method are compared in Figure 6 with the Enhanced k-NN, EXPRW, BATCH, KF and AKF methods. When the mean error values, plotted against the traffic groups in Figure 6, are examined; it is understood that the MAPE value is generally decreasing towards G1 to G5. MAE and RMSE values, on the other hand, are usually increasing in all approaches. It is observed that the error values of PCP are lower than the other approaches with one exception ( Figure 5). This exception occurred at the point where the Enhanced k-NN method had an error of about 1% less than the PCP for the MAPE value of G1. as 45 and 55 veh/h/ln, respectively. The lowest error values exist generally around 4 a.m. at night when the lowest traffic flow is observed.
The change of R 2 values calculated according to the estimation results of PCP according to traffic groups and intraday hours is illustrated in Figure 5. According to traffic groups, higher R 2 values were more frequent in cases representing low traffic flow values. With the increase of traffic flow value (from G1 to G5) R 2 decreased. In G5 representing very high traffic flow values, it is seen from Figure 5 that the predictive performance of PCP decreased dramatically. However, this is not exactly true, since in some stations the number of samples in the G5 case was very small. Therefore, the calculated R 2 for some stations was too low. However, if the outliers were not considered, it is clearly seen from Figure 4 that the average R 2 for G5 was 70%. The average R 2 Groups R 2 Traffic groups  this situation. Models that can make accurate traffic estimation have the potential to make these systems more efficient. For this purpose, studies for developing such models are in progress. These models are generally developed based on data. Therefore, the main objective of this study was to develop an algorithm that finds more suitable datasets for these data-driven models. In addition, the performance of this approach has been tested in short-term traffic flow forecasting.
To evaluate the performance of PCP, comparisons were made with previous models and this evaluation has shown that the PCP method makes fewer errors in predicting traffic flows than the Enhanced k-NN, EXRW, BATCH, KF and AKF methods. The results of this research support the idea that new and more accurate approaches to predict short-term traffic flow can be developed by the researchers.
More research is required to determine the efficacy of PCP. For example, the dataset used contains only uninterrupted traffic flow data. The effect of PCP under interrupted flow can be investigated in further studies. In addition, the performance of PCP was not investigated against the missing values in the data. This limitation should be clarified in the future.
PCP introduced a new periodic clustering approach to obtain more appropriate training data for ANNs. The developed PCP approach only needs a small amount of past traffic flow data. In addition, k-means and ANNs algorithms used in today's computers can work fast. For these reasons, the developed method can be used in traffic applications. In addition, the usage area of PCP is not limited to estimating only the traffic flow, but the idea is to use it successfully in the time series.
When the error values given in Figure 5 are analysed in more detail, the approaches other than PCP and Enhanced k-NN produced more than 10% MAPE in G1. The PCP and Enhanced k-NN produced less than 10 MAEs for the same traffic group, while the other approaches produced over 20 MAEs. The PCP and Enhanced k-NN produced less than 10 MAEs for the same traffic group, while the other approaches produced 20 veh/h. In RMSE, PCP achieved higher performance compared to the other approaches. It is observed that the MAPE values of Enhanced k-NN in G4 and G5 groups with the highest traffic flow rate were 3.8% and 2.3%, and PCP is 3.1% and 2.1%, respectively. It is understood from Figure 6 that the greatest difference is realized in G3, which is the group observed from many points in daytime hours during the day. This difference occurred at 14 veh/h/ln for MAE and 21 veh/h/ln for RMSE. In the light of the above comparisons, the developed PCP method seems to perform better than the other referenced methods.
In brief, the following evaluations can be made according to the results discussed in this section. The MAPE value is encountered as high, especially in case of small traffic flow values. For this reason, it is understood that the high error percentages made even for a few points increase the average MAPE value. The mean MAPE of the PCP being higher than the Enhanced k-NN error values for G1 indicates that the PCP makes mistakes at fewer points with high percentages, and the Enhanced k-NN makes mistakes at more points with small percentages. This is also evident from the analysis of the mean values of MAE and RMSE. The PCP method yielded better results compared to the other approaches examined at all points, including G1. The G1 state indicates the lowest traffic flow condition and usually occurs at midnight hours. In terms of traffic management, these cases do not have critical importance since the road capacity is already more than the existing traffic. However, for groups with a high and remarkably high traffic flow rate (G3-G5), the short-term traffic flow rate prediction becomes more important. For all these values, it was determined that PCP produces lower error values for all performance criteria.

CONCLUSION
The amount of road traffic flow increases and becomes difficult to control over time. Nowadays, sophisticated systems have started to be used to control