A HYBRID SHORT-TERM TRAFFIC FLOW FORECASTING METHOD BASED ON NEURAL NETWORKS COMBINED WITH K-NEAREST NEIGHBOR

It is critical to implement accurate short-term traffic forecasting in traffic management and control applications. This paper proposes a hybrid forecasting method based on neural networks combined with the K-nearest neighbor (K-NN) method for short-term traffic flow forecasting. The procedure of training a neural network model using existing traffic input-output data, i.e., training data, is indispensable for fine-tuning the prediction model. Based on this point, the K-NN method was employed to reconstruct the training data for neural network models while considering the similarity of traffic flow patterns. This was done through collecting the specific state vectors that were closest to the current state vectors from the historical database to enhance the relationship between the inputs and outputs for the neural network models. In this study, we selected four different neural network models, i.e., back-propagation (BP) neural network, radial basis function (RBF) neural network, generalized regression (GR) neural network, and Elman neural network, all of which have been widely applied for short-term traffic forecasting. Using real world traffic data, the experimental results primarily show that the BP and GR neural networks combined with the K-NN method have better prediction performance, and both are sensitive to the size of the training data. Secondly, the forecast accuracies of the RBF and Elman neural networks combined with the K-NN method both remain fairly stable with the increasing size of the training data. In summary, the proposed hybrid forecasting approach outperforms the conventional forecasting models, facilitating the implementation of short-term traffic forecasting in traffic management and control applications.


INTRODUCTION
Traffic flow forecasting, especially short-term traffic flow forecasting, has been recognized as a critical requirement for intelligent transportation systems (ITS). The development of traffic forecasting models can enable realizing the full benefits of ITS and support the development of proactive transportation management and comprehensive traveler information service [1,2]. Therefore, accurate short-term traffic information forecasting is important for developing real-time, dynamic, and highly efficient traffic management and control systems.
As known, the rate of vehicular flow is defined as the equivalent hourly rate at which vehicles pass over a given point or section of a lane or roadway during a given time interval. In general, the time interval is less than 1 hour. Clearly, the time interval used in measuring the flow rate influences the characteristics of generated flow rate measurements [3,4]. Minimum 15-min measurement time intervals are recommended by the Highway Capacity Manual [5]. Furthermore, a time series model (e.g., seasonal autoregressive integrated moving average (SARIMA)) has been demonstrated as applicable for stable traffic flow series with an aggregation time interval of 10 min or longer, which raises the issue of identifying appropriate and accurate methods for generating traffic flow forecasts at shorter time intervals (such as 5 min) [4]. Compared with statistical methods, neural networks are often applied in building traffic flow prediction models with time intervals of 10 min, 5 min, or less. Nevertheless, traffic is typically characterized by repeatable day-to-day, week-to-week, or month-to-month flow patterns. With respect to this characteristic, literature has shown that neural networks are among the best alternatives for modeling and predicting traffic parameters [6][7][8][9]. In general, it is necessary to train neural networks using existing traffic samples, i.e., the input-output dataset. Consequently, neural networks can benefit from similar traffic flow patterns to generate forecasts. In fact, existing traffic samples, i.e., training data for neural networks, can be derived from any traffic information, such as vehicle locations, vehicle speeds, occupancy, weather information, etc. However, traffic flow patterns have not been sufficiently applied by neural networks for improving the accuracy of short-term traffic flow forecasting. Based on this point, the K-nearest neighbor (K-NN) method is used to reconstruct the training data derived from a historical traffic flow database through extracting similar traffic flow patterns, which should be as similar as possible to the current traffic flow condition. To this end, a short-term traffic flow forecasting hybrid method is proposed here through combining neural networks with the K-NN method.
From the methodological standpoint, this paper extends past research by introducing the pattern recognition approach into the reconstruction of repeatable traffic flow patterns. Consequently, the "most similar" traffic flow patterns are identified for neural networks, which generally need existing traffic flow patterns for training. In this regard, traffic flow patterns derived from historical traffic flow data have not been adequately exploited in the past for short-term traffic flow forecasting, and the purpose of this paper is to improve the accuracy of such forecasting through the application of the proposed hybrid method. The remainder of this paper is organized as follows. First, the following section is a brief literature review of short-term traffic flow forecasting methods and details behind those approaches. Section 3 describes the methodology behind the combination of neural networks and the K-NN method. In Section 4, an empirical study is performed based on real world traffic flow data, and empirical results are discussed and analyzed. Finally, Section 5 concludes this paper and suggests some directions for further research.

LITERATURE REVIEW
Short-term traffic flow forecasting models rely on regularities existing in historical traffic data to predict traffic conditions (e.g., traffic flow, speed, and travel time). This requires a good prediction model which can adequately capture high dimension and nonlinear characteristics in traffic data. Recently, many models have been proposed and developed for addressing the traffic prediction problems. These models can be broadly classified into three categories, those being parametric, nonparametric, and hybrid models. The parametric models are usually based on explicit mathematical foundations to perform reasonable prediction, e.g., historical average [10], time series models [11,12], and Kalman filter [13,14]. In contrast, nonparametric models are mostly data-driven and apply empirical methods to provide the prediction, including primarily neural network models [6][7][8][9]15], nonparametric regression [16][17][18][19][20], and support vector machine [21][22]. In addition, the hybrid approach combines two or more models to generate the prediction, e.g., non-linear chaotic prediction model [23], multi-agent prediction model [24], and modular networks model [25], etc. More details about prediction methods can be found in [26][27].
Recently, hybrid approaches to short-term traffic forecasting have received a lot of attention and yielded very encouraging results. A process of seasonal autoregressive integrated moving average plus generalized autoregressive conditional heteroscedasticity (SARIMA + GARCH) was implemented using Kalman filtering given the need for real time processing [28]. The spectral analysis technique was combined with statistic volatility models to explore insights into underlying traffic flow patterns [29]. A Bayesian inference-based dynamic linear model was presented to predict online short-term travel time and investigate the uncertainty of travel time prediction [30]. For the nonparametric approach, a hybrid model of a chaos-wavelet analysis-support vector machine was developed to predict traffic speed through constructing a new kernel function and using the phase space reconstruction theory to identify the input space dimension [23]. As one of widely used heuristic approaches, neural network models have been improved based on combining with other algorithms, such as the state-space neural network [31], fuzzy-neural network [32][33], long shortterm memory neural network [34], Bayesian combined neural network [35], and modular neural networks [36]. These hybrid models all showed their superiority over singular models due to high computation efficiency while performing as well as the traditional ones.
Traffic flow can be considered as a both temporal and spatial phenomenon with characteristics of flow patterns that are repeatable from day to day, week to week, or month to month. For example, there is usually one peak on weekends and two peaks on weekdays. Considering these repeatable and periodic features in the traffic flow data enables us to gain insights into traffic flow data and improve the accuracy of shortterm traffic flow forecasting. Stathopoulos and Karlaftis [37] explored the spectral characteristics of traffic where v(t) presents the traffic flow at the time t, and v̂(t+d) is the predicted traffic flow at the time t+d. F(·) is a nonlinear function. d is the collection time interval of traffic flow data. The basic structure of a neural network model consists of multiple layers, generally including one input layer, one or more hidden layers, and one output layer. Each layer comprises several nodes connected to the nodes in neighboring layers. Figure 1 shows a basic structure of feed-forward neural network models that can generate forecasts. Figure 1 shows that the basic structure of a neural network model contains one input layer, several hidden layers (at least one), and one output layer. The vector X=(v t ,v t-1 ,…,v t-n ) T is the input vector of the network, and the vector Y=(v̂t +1 ,v̂t +2 ,…,v̂t +d ) T is the output vector of the network. The vector W h1 =(w 11 ,…,w 1j ; w 21 ,…,w 2j ;…;w i1 ,…,w ij ) T is the weight matrix between the input layer and the first hidden layer; j is the number of hidden nodes in the hidden layer. Similarly, the vector W h2 =(w 11 ,…,w 1r ;w 21 ,…,w 2r ;…;w j1 ,…,w jr ) T is the weight matrix between the first hidden layer and the third hidden layer, and r is the number of hidden nodes in the third hidden layer. The vector W hn =(w 11 ,…, w 1m ;w 21 ,…,w 2m ;…;w r1 ,…,w km ) T is the weight matrix between the last hidden layer and the output layer. Each input node communicates its state to all hidden nodes. These nodes compute their states by processing the information received from the input nodes and then communicate the states to the output nodes. The output nodes then use this information to compute the system response. Note that the notation ∑ represents the linear/non-linear activation function in the input/ output layer. f h represents the transfer function, processing the information between different layers. Therefore, the output v̂t +d of the network can be represented as where F is the nonlinear activation function; w is the weights of the connection between the layers; v m is the output of the last hidden layer; and v̂t +d is the output of flows and captured the lead and lag structure of flow between different urban locations. Zhang et al. [29] provided deeper insights into underlying traffic patterns and improved prediction accuracy and reliability by modeling traffic patterns separately. By considering the spatial-temporal features of traffic patterns, a K-nearest neighbor (K-NN) algorithm was used to generate multi-time-step predictions [38]. Based on large traffic flow datasets collected from different regions, identification of similar traffic patterns through the K-nearest neighbor (K-NN) method provided very promising results [20].
In summary, based on traditional prediction models, exploiting the characteristics of traffic patterns can improve the accuracy of short-term traffic flow forecasting. As one of typical prediction approaches, different neural network models have been widely applied for short-term traffic flow forecasting. Therefore, this study proposes a hybrid prediction approach based on combining the neural network models with the K-NN method to improve prediction accuracy through enhancing the similarity of traffic flow patterns.

Neural network prediction models
This subsection provides a brief discussion on neural network models for short-term traffic flow forecasting. The prediction models of neural networks are a data-driven approach and have the capability of complex mapping between inputs and outputs that enables appropriating nonlinear functions. Traffic flow rate is defined as the equivalent hourly rate at which vehicles pass over a given point or section of a lane or roadway during a given time interval less than 1 hour. Therefore, the collected traffic flow data, that is, time series in nature, can be used to forecast the future traffic flow. With the application of neural networks, the inputs can be previous lagged traffic flow values while the outputs can provide future traffic flow forecasts. Mathematically, the input-output relation of neural network models for prediction can be represented as In the hidden layer, a basis (or kernel) function is used by the GR neural network. It estimates the joint probability between the inputs and outputs [39]. The GR neural network can be presented for prediction as: where v i (t+d) is the traffic flow at the historical time t+d.
In addition to the input layer, the hidden layer, and the output layer, the Elman neural network has a continuous layer. The nodes in the continuous layer are used to memorize the previous activations of the hidden nodes and can be considered to function as one-step time delays. The function of the Elman neural network can be presented as: where c i is the output value of the i-th node in the continuous layer, w ij is the weights between the input layer and hidden layer, w jr is the weights between the continuous layer and the hidden layer, w jk is the weights between the output layer and the hidden layer. g is often taken as a linear function.
Overall, these four neural network models are widely used for short-term traffic flow forecasting. More details about them can be found in [9,[40][41][42], respectively.

K-nearest neighbor (K-NN) method
The K-NN method is a non-parametric pattern recognition technique which is commonly used for classification and regression purposes. The main advantages of the K-NN method include intuitive formulation, which is free of assumptions on data distribution, high flexibility, and easy extendibility [17]. References [1] and [19] provide an excellent review on the K-NN method for short-term traffic flow forecasting.
A typical K-NN method consists of four basic elements: definition of an appropriate state vector; definition of a distance metric to determine the nearness between state vectors; selection of a forecast generation method given a collection of nearest neighbors; and management of the potential neighbor database. In this study, the K-NN method is not used to generate forecasts but to collect similar traffic patterns from historical traffic flow data. the output layer. In order to perform traffic prediction, the weights existing in the networks need to be determined. Therefore, it is necessary for neural network models to be trained with existing traffic samples of input-output data. The optimal weights in the networks can be determined when the error between the network output and the observed output is minimized. The average system error for model training is shown as E.
where v̂n i is the output value of the network for the n-th training set; v ni is the observed value for the n-th training set; m is the number of nodes in the output layer; and N is the number of the training set.
Based on this structure of neural networks, many neural network models have been developed, such as back-propagation (BP) neural network, radial basis function (RBF) neural network, generalized regression (GR) neural network, and Elman neural network model. These neural network models are all widely used for short-term traffic flow forecasting. A three-layer network is usually employed for the BP and RBF neural networks, including one input layer, one hidden layer, and one output layer. On the other hand, the GR and Elman neural networks usually consist of four layers, including one input layer, one hidden layer, one summation layer or continuous layer, and one output layer. According to Equation 2, the function of the BP neural network can be presented as: where v(t-i) are the inputs; v̂(t+d) is the predicted traffic flow at the time t+d in the future; w ij is the connection weight between the input layer and hidden layer; w jk is the connection weight between the hidden layer and the output layer; u is the linear combiner output attributable to the input signals. f is the activation function, also called the transfer function, determining the relationship between the inputs and outputs of a neuron and a network.
Unlike the BP neural network, a basis (or kernel) function is used in the hidden layer of the RBF neural network. The RBF neural network can be presented as: where z j (i) is the output of the i-th node in the hidden layer.
historical traffic database can be sorted from large to small, according to similarity. Consequently, training data can be reconstructed through assembling similar historical traffic flow patterns without considering the collected time of traffic flow data. The reconstructed training data then can be used to train the neural network model for generating future traffic flow forecasts.

Neural network models combined with K-NN method
In order to perform predictions, neural network models need to be trained with existing traffic examples of input-output data. According to this feature of neural networks, we take the pattern characteristics of traffic flow data into consideration and utilize the K-NN method to reconstruct the training data of the neural networks, assembling similar traffic flow patterns without considering the temporal condition. The flow chart of the neural networks combined with the K-NN method is presented in Figure 2, in which the overall prediction procedure builds upon the aforementioned BP, RBF, GR, and Elman neural network models.
As shown in Figure 2, a key step in the overall prediction procedure is to assemble similar traffic patterns from the historical traffic database. Specifically, the traffic flow patterns can be presented through defined state vectors. The similarity between the state The state definition of the K-NN method is very flexible, and can even be in any form. For traffic flow series, a series of measured traffic flow is commonly selected to define the state during the past time intervals d. For example, a state vector x(t) of flow rate measurements collected every 5 min can be written as: where x(t) represents the state vector at the time t; v(t) is the flow rate during the current time interval; v(t-1) is the flow rate during the previous time interval, etc. According to the definition, a state vector can reflect a traffic flow pattern with d+1 consecutive time intervals. Therefore, traffic flow time series data can be divided into different traffic flow patterns. The closeness of one state vector to another is commonly measured by Euclidean distance, according to which neighbors are ranked and selected. It can be defined as: where d i is the distance between the current data and the i-th historical data; v j is the j-th value in the current state vector; and v ji is the j-th value in the i-th historical data.
Based on the state vector definition, the similarity between the state vectors derived from the historical traffic database and the present state vector can be measured. Clearly, state vectors derived from the Searching the nearest neighbours  Similarly, a whole week can contain 7 · 288=2016 observations. In this study, the traffic data collection time period was from 00:00 to 24:00 within half a year (from May to October 2011). According to the requirements of the proposed approach in this study, the traffic data for each station is further divided into two groups, including the historical database and testing database, which are shown in Table 1. Table 1, the collected traffic flow data in G1 provided the training data for development of neural network models. Traditionally, the traffic flow data from G1 was chronologically gathered as the training data. Regardless of the collecting order of traffic flow data, assembling similar traffic flow patterns together in G1 can provide new training data for the development of neural network models. The models were then evaluated using the testing data from G2.

Determination of neural network structures
The purpose of this subsection is to determine the number of forecast steps and achieve optimal training performance of neural networks. According to the description of the neural network models, BP and RBF neural networks are both designed to be a three-layer structure, while GR and Elman neural network models are designed to be a four-layer structure. In this study, vectors that already existed in the historical database and the state vector at present time can be measured. Sorted according to similarity, the nearest neighbors can be assembled as the reconstructed training data, the aim of which is to enhance the input-output relation in the neural networks for generating future traffic flow forecasts. It is worthwhile to note that the number of nearest neighbors determines the size of the reconstructed training data. Due to the different principles of the selected four neural network models, it is necessary to explore the effect of size of the reconstructed training data on the forecast accuracy.

Performance measures
To evaluate the performance of the forecasting results in this study, the measures applied include mean absolute percentage error (MAPE) and root-meansquare deviation (RMSE), which are shown as follows: where x i stands for the observed data; x̂i stands for the forecasting data; and n is the number of observations.

CASE STUDY
In this section, real world traffic flow data is applied to evaluate the performance of the proposed hybrid approach based on the neural network models combined with the K-NN method.

Data collection
The data used in this study was collected by the Washington State Department of Transportation (WADOT) using inductive loop detectors, which were installed on the Interstate 5 freeway corridor with 4 lanes northbound in Seattle. The traffic data from three stations along the I-5 road was used (Stations 14015, 14064, and 14126). The traffic data used in this study (including the stations) can be downloaded from https://www.its-rde.net (Department of Transportation, USA). The approximate locations of these stations are illustrated in Figure 3.
Although volume, occupancy, and speed were collected from Stations 14015, 14064, and 14126, only volume was considered in this study. Note that traffic volume was collected at 5-min intervals, 24 h per day, and then traffic flow rate data at each collection time interval was obtained according to the number of where x(t) represents the state vector at the time t; v(t) is the flow rate during the current time interval; and v(t-1) is the flow rate during the previous time interval, etc.
According to Table 1, the historical traffic flow data from G1 group was selected to serve as training data. Chronologically, the historical traffic flow data was gathered as the original training data. Traffic flow data in G1 for Station 14015 for a whole week from September 25, 2011 to October 1, 2011 is shown in Figure 4. Figure 4 clearly shows a temporal pattern, where the first five days are workdays, and the last two days are weekends. On weekdays, there are two peak periods in one day, while on weekends there is only one. According to the defined state vector and measured similarity, assembling similar state vectors can provide a similar temporal pattern. For last days of weekends, a new traffic flow time series derived from the reconstructed training data for Station 14014 for one day, October 1, 2011, is shown in Figure 5. Figure 5 clearly shows that the traffic flow series derived from the reconstructed training data has almost the same temporal pattern, regardless of weekdays or weekends. This is important for unifying the traffic flow patterns and making it easier for the neural network model to capture or learn this single pattern. Hence, the prediction performance based on this single pattern should improve greatly. The effects of the reconstructed training data on the neural network models are shown in the following section.

Determination of the size of training data
In general, historical traffic flow data can be directly used to train the constructed neural network models. Referring to Table 1, the G1 group contained five months of historical traffic flow data, from which all the designed neural networks are operated in MAT-LAB 2015a, in which the functions newff (P,T,HI,TF,  BTF), newrb(P,T,G,S,MN), newgrnn(P,T,S), and newelm(P,T,HI, BTF) are used to design the BP, RBF, GR, and Elman neural network models, respectively. Here, only the main parameters are listed, where P is the input vector; T is the output vector; HI is the number of hidden nodes in the hidden layer; TF is the transfer function; BTF is the training function; G is the mean square error goal (default=0.0); S is the spread of radial basis function (default=1.0); MN is the maximum number of nodes in the hidden layer.
The abovementioned parameters need to be initialized. Although the changing learning parameters in the BP neural network model could affect the speed of convergence of the learning procedures, these parameters will not affect the structure and overall performance of the BP neural network model. Considering the factor of learning speed in neural network, batch training is almost always (often orders of magnitude) slower than online training, especially on large training sets [40,41]. Meanwhile, the appropriate values of these main parameters are determined through trial-and-error, except for TF and BTF. Referring to [43], the commonly used transfer and training functions in the neural networks are tansig and tranlm, respectively. Other parameters of these models are summarized in Table 2.

Training data reconstruction
In general, the neural network predictions proved to be effective if data from previous time slices were available, as temporal patterns were often present. That means the collected traffic information could be   varying from 2 to 20 weeks were used to reflect the performances of all the neural network models for onestep prediction on Station 14015, which are shown in As shown in Figure 7, the prediction accuracies for the neural network models follow different trends with the increase of the amount of the reconstructed training data. On the one hand, the performances of the RBF and Elman neural networks are fairly flat across all the reconstructed training data sizes in terms of MAPE and RMSE, indicating that the amount of the reconstructed training data hardly affects the forecast accuracy of these two neural network models. On the other hand, the GR and BP neural networks based on the reconstructed training data can perform better than the RBF and Elman neural networks. Specifically, the forecast accuracy of the GR neural network gradually declines with the increasing size of the reconstructed training data. In contrast, the forecast accuracy of the BP neural network gradually increases and then stays different amounts of historical traffic flow data can be directly used as the training data. Considering the size of G1, the amount of training data is set to vary from 2 to 20 weeks with the fixed increment of one week. The effects of the original training data size on the performance of one-step prediction on Station 14015 are illustrated in Figure 6.
It is clearly seen in Figure 6 that increasing the amount of original training data leaves the forecast accuracies of all the neural network models roughly stable. This indicates that the training data size is an insignificant factor for all the models. With same-sized original training data, the BP neural network performance is better than those of the other three models in terms of both MAPE and RMSE. The RBF and Elman neural networks show similar performances, outperforming the GR neural network.
According to the proposed prediction procedure of combining the neural network models with the K-NN method, the training data was reconstructed based on the G1 group by using the K-NN method. Similarly, the amounts of the reconstructed training data 2,000   The performances of the hybrid prediction models based on the neural network models combined with K-NN method were compared with those of the conventional models in terms of MAPE and RMSE, which is summarized in Table 4. Note that multiple step predictions up to 4 are conducted for all the prediction models.
As shown in Table 4, it is clear that combining the BP and GR neural networks with the K-NN method leads to better improvements than when combining the RBF and Elman neural networks with K-NN. Furthermore, the GR neural network can obtain the highest accuracy. For one-step prediction, the MAPE and RMSE improved from 8.96% to 8.07%, and from 76.7 to 64.4, respectively. For multi-step prediction, the MAPEs are below 9% and RMSE.
The forecast performances of all the selected neural networks improved through combining with the K-NN method across all the prediction steps. This is important for showing that the proposed K-NN based training data reconstruction does improve the performance of the conventional neural network models, stable. This indicates that the amount of reconstructed training data can affect forecast performances in different ways.

Collective prediction performance
To evaluate the performance of the neural network models combined with the K-NN method, the conventional neural network models, the typical K-NN method, and a classical parametric model are all treated as benchmarks. For the typical K-NN method, the generation of forecasts is based on the average of the selected nearest neighbors, with reference to [1]. In the ARI-MA model, the optimal parameters p,d,q for the ARIMA model were determined based on the best Akaike Information Criterion (AIC) value. Specific details about ARIMA models can be found in [12]. The ARIMA model used in this study was ARIMA (2,1,2).
According to the effects of the amount of training data on the performance of nonparametric methods, we selected the optimal size of training data for one-step ahead and multi-step ahead predictions, as shown in Table 3.   The findings of this study have multiple ramifications. First, the proposed method could be extended for the congested traffic condition or other abnormal traffic data. Second, the neural network models could be refined in this proposed method so as to further improve the performance of short-term traffic condition forecasting. Finally, considering the spatial nature of traffic systems, the spatial pattern could be incorporated into the forecasting method to meet the requirement of proactive traffic management and control applications.  validating the initial purpose of this paper. In addition, comparing the performances across the forecasting steps, shows that the prediction performances decrease with the increase of the forecasting steps. This is expected given that multi-step prediction can generate forecasts further into the future, which has less supporting information compared with one-step forecasts. Also, the performance data shows that the GR neural network combined with the K-NN method achieves better prediction performance than the other three neural network models.

CONCLUSION
Real-time and accurate short-term traffic flow forecasting is critical for proactive traffic control and management systems, and a number of methods have been proposed in this field. These methods are primarily classified into parametric and non-parametric approach. Considering the significant effect of time interval on prediction performances, all these forecasting methods could be complementary rather than competitive in supporting the development of proactive traffic management and control systems.
Improvements to neural networks can be made from multiple perspectives, and refining the training data is a promising direction. Therefore, in this paper, in order to fully utilize the repeatable traffic flow patterns, the neural networks were combined with the K-NN method, assembling mutually similar traffic flow patterns to reconstruct the training data for improving prediction performances. In doing so, four conventional neural network models were selected, specifically the BR, RBF, GR, and Elman neural networks. Reconstructed data using K-NN was applied for training these models, formulating an integrated short-term traffic flow forecasting model.
Using real world traffic flow data, the proposed forecasting approach is implemented with the performances investigated and demonstrated. First, the reconstructed data are shown together with the