A HYBRID DEEP CONVOLUTIONAL NEURAL NETWORK APPROACH FOR PREDICTING THE TRAFFIC CONGESTION INDEX

Traffic congestion is one of the most important issues in large cities, and the overall travel speed is an important factor that reflects the traffic status on road networks. This study proposes a hybrid deep convolutional neural network (CNN) method that uses gradient descent optimization algorithms and pooling operations for predicting the short-term traffic congestion index in urban networks based on probe vehicles. First, the input data are collected by the probe vehicles to calculate the traffic congestion index (output label). Then, a CNN that uses gradient descent optimization algorithms and pooling operations is applied to enhance its performance. Finally, the proposed model is chosen on the basis of the R-squared (R 2 ) and root mean square error (RMSE) values. In the best-case scenario, the proposed model achieved an R 2 value of 98.7%. In addition, the experiments showed that the proposed model significantly outperforms other algorithms, namely the ordinary least squares (OLS), k-nearest neighbors (KNN), random forest (RF), recurrent neural network (RNN), artificial neural network (ANN), and convolutional long short-term memory (ConvLSTM), in predicting traffic congestion index. Furthermore, using the proposed method, the time-series changes in the traffic congestion status can be reliably visualized for the entire urban network.


INTRODUCTION
Traffic congestion leads to considerable wasted time and slow traffic, and it is one of the main challenges for traffic management agencies and traffic participants.To mention some examples of its detrimental effects on the economy, the traffic congestion costs in Seoul reached approximately 105,542 billion Korean Won (KRW) in 2013 [1], and the economic losses resulting from the traffic congestion in urban networks in the United States were estimated at $305 billion in 2017 [2].To solve the congestion problem, a domain intelligent transport system (ITS) has been developed to enhance smoother, smarter, and safer journeys for traffic participants by adopting advanced techniques.The recent ITS applications have focused on the traffic system to find a solution for traffic problems, such as highway accident prediction [3], traffic speed prediction [4], and congestion prediction [5,6].
In addition, the traffic congestion index does not have a specific standard.For instance, the Highway Capacity Manual 2010 (HCM2010) introduced the six-grade level of service (LOS) in the United States [7].Xing et al. [8] established [0,100] as the range for a traffic congestion index based on the road characteristics and long-term observation.In addition, the local road network and congestion delay time were considered to estimate the traffic congestion index [9].Feifei et al. [10] estimated the traffic congestion of urban road networks using the non-congestion state and average road speed.Nguyen et al. [11] evaluated an efficient traffic congestion index based on the density performance and velocity performance indices.Nevertheless, it is still difficult to collect comprehensive data, including travel speed, traffic density, and traffic volume.The Global Positioning System (GPS) has become an effective sequences by considering nonlinear traffic information and time-series datasets [22].To deal with the vanishing gradient problems of long-term dependencies [23], many advanced RNNs have been proposed, including the gated recurrent unit (GRU) [24] and the stacked LSTM [25,26].For example, Sun et al. [27] showed that the LSTM and GRU models performed better for long-term traffic congestion prediction; it was also found that the stacked LSTM model is related to the temporal and spatial correlations of traffic speed [28].Moreover, the CNN enabled the extraction of spatial features, and it has been applied for image recognition, object detection, and segmentation [29,30].Kurniawan et al. [31] applied the CNN for traffic congestion detection using CCTV camera images.Ma et al. [32] used the CNN for large-scale traffic speed prediction, and it achieved a great performance in comparison with the OLS, KNN, ANN, and RF methods.However, speed does not directly reflect the state of traffic congestion that could be measured by speed, travel time, delay, LOS, and congestion indices.In addition, they did not consider the effect of gradient descent optimization and pooling operations on prediction accuracy.The complexity of deep-learning models also results in the problem of overfitting.The short-term traffic state prediction performance was found to be better than the gradient descent optimization [33].More importantly, the speed relationship has a large variance.Thus, the CNN model has the potential for short-term traffic congestion as it enables the extraction of the spatial relationship of the speed GPS data to guarantee the most important features.
In this study, we propose a hybrid deep CNN method that uses gradient descent optimization algorithms and pooling operations for predicting the short-term traffic congestion index using GPS probe vehicles.First, the input data are collected by GPS probe vehicles and geographic information systems (QGIS) from the Gangnam urban network in Seoul; the data comprise travel speed and free-flow speed (FFS), which are used for calculating and classifying the traffic congestion index according to HCM2010.Second, a hybrid CNN method that uses pooling operations (e.g.maximum and average pooling) and gradient descent optimization algorithms (e.g.stochastic gradient descent (SGD), adaptive learning rate (AdaDelta), and adaptive moment estimation (ADAM)), is applied to enhance the prediction performance of the CNN model.Finally, the R 2 and alternative tool for obtaining real-time traffic datasets [12].The travel speed is used as an index for predicting traffic congestion.
Prediction of short-term traffic congestion has been a critical issue for ITS applications because more accurate information is required for traffic management agencies and stakeholders.Both shortterm and long-term congestion predictions are based on real-time data to estimate the traffic congestion status in a time future.Furthermore, they depend on the data update rate for a few minutes to a few hours.In this work, the data update rate in a 5-minute interval is considered as short-term congestion prediction.In general, the methodologies applied for short-term traffic prediction are classified into two types: statistical methods and neural-network methods.The statistical methods are widely applied to predict traffic congestion.For example, Dawei et al. [13] applied OLS to a high-speed network, and Liu et al. [14] used a Bayesian network to predict urban road congestion based on road construction and bus development data.Lee et al. [15] applied a multiple linear regression model to the fusion of traffic congestion and weather factors.In addition, Liu and Wu [16] used the RF method to predict traffic congestion statuses.In conclusion, it was found that statistical algorithms cannot consider the spatiotemporal relationship of the traffic information and cannot be applied to the dynamic environment of road networks.As an improvement, artificial neural networks can work well for multi-dimensional and flexible data, and they have been applied to predict traffic congestion.For example, Mondal and Rehena [17] applied an ANN to the traffic congestion classification.The fusion of the ANN and decision tree has also been used for predicting the traffic congestion state on the basis of GPS data [18].Nonetheless, the spatiotemporal features of traffic information could not be explained well.
Building upon these works, deep-learning algorithms have become potent approaches for traffic congestion prediction [19].For instance, Zhang et al. [20] used a deep autoencoder neural network to forecast short-term traffic congestion in transportation networks.In addition, the attention-based long short-term memory (LSTM) was used to forecast traffic congestion, and it achieved superior performance over the autoregressive integrated moving average, KNN, and extreme gradient boosting methods [21].Recently, based on internal memory units, the RNN managed to process temporal approach is designed to improve important feature extraction and updated parameter optimization for predicting the traffic congestion index in urban road networks.In addition, the speed relationship has a large variance.Hence, the 5-minute intervals can express well the speed relationship and enhance the accuracy of traffic congestion prediction.The HCM2010 is applied to classify the traffic congestion situation.First, the raw data are transferred into the time sequence data through data pre-processing.Second, the time-series sequence data are converted to a time-space image (the input of the CNN model).Third, the traffic congestion index (the output label) is estimated by the running time and delay time.Fourth, the maximum pooling and average pooling are applied to improve important feature extraction.Furthermore, gradient descent optimization algorithms are used to enhance the updated parameter optimization.Next, the proposed model is compared with other models (OLS, KNN, RF, ANN, RNN, and ConvLSTM) in terms of R 2 and RMSE values.Finally, the time-series changes in the traffic congestion status in urban networks are predicted at 5-minute intervals.We also discuss the proposed model in more detail in the following sections.Figure 1 shows the research flow architecture.

Traffic congestion index
The travel speed is an important service measure for the state of an urban network.According to the HCM2010, the LOS on an urban road depends on the average travel speed, which is measured using the running time and the delay time.The traffic congestion index (TCI) is calculated as follows.RMSE are applied to choose a suitable approach for predicting the traffic congestion index.Overall, the major contributions of this work are as follows.
-An enhanced hybrid deep CNN is proposed that considers some gradient descent optimization algorithms (i.e., SGD, AdaDelta, and ADAM) and pooling operations (i.e., maximum and average pooling operations), for predicting the short-term traffic congestion index in an urban network using GPS probe vehicles.The traffic congestion index reflects the congestion situation in a time future.
The experiment shows that the prediction accuracy is a significant improvement over the OLS, KNN, RF, ANN, RNN, and ConvLSTM methods.-A set of hyperparameters is also proposed to explore the effect of important features on the traffic congestion prediction and to obtain reliable prediction performance at an urban network.-The experimental results for a road network in the Gangnam area demonstrate the significant effect of the proposed method, and the time-series changes in the traffic congestion situation could be reliably predicted.This paper is structured as follows.Section 2 presents the proposed model architecture, the traffic congestion index, the CNN, and the gradient descent optimization algorithms (OLS, KNN, RF, ANN, RNN, and ConvLSTM).Section 3 describes the conducted experiments and their results, and Section 4 concludes the study.

Proposed model architecture
In this study, we proposed an approach to predict the short-term traffic congestion index based on the time-series sequence of the speed dataset.This After the feature extraction process, the filter features are flattened and fed into the fully connected layer.The output is then interpreted by the rectified linear unit (ReLU) function; it directly extracts the input if it is positive or else extracts zero.The main advantage of the ReLU function is a reduced likelihood of vanishing gradient.This benefit makes it faster learning and more beneficial than other functions.In particular, deep neural networks with the ReLU function have a better convergence performance than the tanh function [35].Thus, the ReLU function is applied to the CNN model because of the ease of training and better performance.First, the time-series sequence data of the traffic speed is converted into a time-space image to generate the CNN input data, as shown here [36].
where N is the length of the time intervals, F is the length of the input time intervals, P is the length of the output time intervals, i denotes the sample index, and m i denotes the column vector Second, the convolutional layer identifies the patterns between the time intervals.In other words, the convolution layer uses small filters to generate a feature map with a higher level of representation.The output feature map is convolved with the convolution kernel and processed by the activation function to generate the feature map for the following layer; the process can be expressed using the following equation.
where σ is the ReLU activation function, x j l is the j th feature map of the l th layer, x i l-1 is the i th feature map of the (l−1) th layer, k l ij is the convolution kernel, and b j l is the bias term.
where I s is the mean speed performance index, I TTI is the time performance index, v i is the observation speed of the i th vehicle, v max is the FFS of each link, and T r and T t denote the running time and the total observation time of each link, respectively.The evaluation criterion for the traffic congestion index on the urban road based on the LOS classification from HCM2010 is shown in Table 1.

Convolutional neural network
CNN, introduced by Hubel and Wiesel [34], is a special type of deep neural networks.The basic structure of a CNN in the context of the traffic congestion index comprises two convolutional hidden, two pooling, and fully connected layers.CNN has become a promising method for traffic congestion prediction thanks to extracting important features.Firstly, the CNN extracts spatial hierarchies of features automatically through convolutional hidden and pooling layers.Finally, the output of the feature extraction is mapped through a fully connected layer.where x (i) denotes the input data, y (i) denotes the labels, η denotes the learning rate, J(θ) denotes the objective function, and d θ denotes the objective function gradient.
Second, instead of considering all the past gradients, the AdaDelta is based on the moving window of the gradient updates [38], and it also uses a component to enhance the acceleration learning.Therefore, the AdaDelta improves the learning rate converging to zero as time progresses.
where E[g 2 ] t denotes the expected value of the accumulate gradient at time t, E[g 2 ] t-1 denotes the expected value of the accumulate gradient at time t−1, ρ is the decay constant, and g t is the gradient of the parameters.Third, ADAM optimizes the stochastic objective functions according to the estimation of the averages of both the gradients and the second moments of the gradients of the historical gradients [39].Moreover, it is suitable for big data and requires low memory space.
where E[v t ] is the expected value of the exponential moving average at a timestep t, g t is the gradient of the parameters, and E[g 2 t ] is the true second moment.

Performance comparison
To evaluate the superiority of the proposed model, the hybrid CNN model was compared with other models, namely the OLS, KNN, RF, ANN, RNN, and ConvLSTM models.The OLS regression model is the simplest statistical algorithm, and the KNN regression model applies the similarity of features to forecast new data points.The RF regression model uses many individual decision trees.In this work, the KNN regression model used a grid search to determine the optimal nearest points.The RF was set to produce 200 decision trees.
The ANN uses data-processing techniques that are based on how the human brain processes information.The multilayer perceptron (MLP), which was introduced by Rumelhart et al. [40], is a development over the original ANN.In this study, the MLP comprised multiple layers, including an input layer, a hidden layer, and an output layer.The ANN was configured to apply 100 hidden units in a hidden layer.
The pooling layer is applied to reduce the dimensionality and preserve the important information; the common categories of the pooling operations are average pooling and maximum pooling.Considering some important features, this study explores and chooses the appropriate pooling layer.The maximum pooling operation calculates the maximum value for each filter of the feature map, and the average pooling operation computes the average value for each filter of the feature map.The process can be expressed as follows.
, , where y pool denotes the pooling output, d ef is the data value of the input matrix at positions e and f, p and q are the two dimensions of the pooling size, and pool denotes the maximum or average pooling operations.
The output of the traffic speed feature extraction is converted to a dense vector using the flatten output layer, which is a higher-level feature of the time-series sequence of the traffic speed data.This output is then fed into the fully connected layer.The flatten output layer with L as the depth of the CNN model can be expressed as follows.
, , , y flatten y y y Finally, the CNN training process comprises forward and backward propagation.The forward process predicts the output value (traffic congestion index) from the input feature matrix (traffic speed dataset), and the backward process updates the weight of the CNN model to minimize the cost function.The backward propagation is discussed in more detail in section 2.4.

Gradient descent optimization algorithms
Gradient descent optimization [35] is one of the best methods for training deep neural networks.Basically, this method is used for minimizing the objective function by updating the parameters in the opposite direction.In this study, we compare SGD, AdaDelta, and ADAM to determine the best optimization algorithm for traffic congestion prediction.
First, the SGD uses a batch size of only one per iteration.In the SGD, each batch is randomly chosen, and the SGD moves the redundant data by getting one update at a time [37].Hence, the SGD is affected by noise.

, , J x y
a transportation management center to record the travel speed, travel time, segment lengths, and FFS of the probe vehicles at 5-minute intervals to account for the fluctuation over shorter time intervals.The raw data obtained using the GPS probe vehicles and the QGIS data are shown in Tables 2  and 3, respectively.Finally, the raw data obtained using the GPS probe vehicles and the QGIS data were integrated via data pre-processing.This process also separated the data into 5-minute intervals for each link to generate a time-series sequence of the input data, as shown in Table 4.We selected the first 16 days as the training set, 4 continuous days as the validation set, and the last 11 days as the testing set.
The RNN, initially developed by James and Rumelhart [41], is a special type of feed-forward model owing to its ability to remember the past for sequence processing.In this study, the stacked LSTM model was optimized to contain three hidden LSTM layers with 512, 512, and 512 hidden units in each layer.Furthermore, because the ConvLSTM [35] is like the LSTM model, the internal matrix multiplications were replaced by the convolution operations.In this work, the ConvLSTM model included two convolutional hidden layers, two average pooling layers, and one LSTM layer.

Data description
The Gangnam district has the highest congestion level in Seoul, and the urban network in Gangnam witnesses big traffic volumes.Hence, the Gangnam district was chosen as the study area, with 24 main links, as shown in Figure 3.In the field, a large number of input data must be obtained.However, in this study, we limited our focus on the traffic data to one month.The input data were collected using GPS probe vehicles for a period of one month (1 July 2015-31 July 2015).The research area has about 5500 probe vehicles per day.The total number of speed data is about 7.2 million.A GPS receiver was associated with  where y i denotes the true value, ŷ i denotes the predicted value, and n denotes the number of observations.R 2 measures the percentage of the variance for a dependent variable that is explained by independent variables.In addition, it calculates the strength of the relationship between the regression model and the dependent variables.R 2 ranges from 0 to 1, which means that the model with a higher value is better, as shown below.

R Total Variation
Unexplained Variation SS SS 1 1 where SS regression denotes the sum squared regression error, and SS Total denotes the sum squared total error.

Improving accuracy
To enhance the accuracy of the CNN model, we considered the effects of the gradient descent optimizations (ADAM, SGD, and AdaDelta) and pooling operations (maximum and average pooling) for a hybrid CNN model.Tables 6 and 7 and Figure 4 show the results of the hybrid CNN model for the training and validation datasets.The CNN model with the average pooling and ADAM algorithm (proposed model) achieved the highest prediction performance with the training and validation average R 2 of 0.985 and 0.976, respectively, followed by the CNN model with the maximum pooling and the ADAM algorithm (0.982 and 0.974).This indicates that the proposed model explained 98.5% and 97.6% of the variance in the dependent variable for the training and validation datasets, respectively.Similarly, the proposed model achieved the lowest prediction error with the training and validation average RMSE values of 0.0345 and 0.0433, respectively, followed by the CNN model with the maximum pooling and ADAM algorithm (0.0367

Parameter settings and evaluation metrics
Hyperparameter optimization is an important role to build a perfect model architecture.However, there is no specific rule to choose the number of hidden layers and the number of neurons in each hidden layer.The trial-and-error method has become the most reliable approach to decide hyperparameters for a specific problem through systematic experimentation.In this work, the trial-and-error method was applied to configure the set of hyperparameters in the traffic congestion context according to RMSE and R 2 .In addition, the cross-validation method was used to evaluate the accuracy of the testing dataset.The set of hyperparameters of the CNN model are shown in Table 5.Based on our experiment, the prediction accuracy performs well at the number of hidden layers of 2.
The prediction accuracy was evaluated using RMSE and R 2 .The RMSE results in negatively oriented scores, which means that the model with a lower value is better.The RMSE measures the average squared values between the prediction and actual observation values, as shown below.
) RMSE n y yi other models in all circumstances studied.Therefore, the proposed model is a suitable method for predicting the traffic congestion index.

Performance evaluation
To verify the superiority of the proposed model, other algorithms were chosen for comparison, namely the OLS, KNN, RF, ANN, RNN, and and 0.0453, respectively).Figure 4 indicates that the fluctuation range of the proposed model was smaller than that of the others, corresponding to the lowest STD of the average R 2 and RMSE values of 0.0068 and 0.0106 (training datasets) and 0.0076 and 0.0102 (validation datasets), respectively.The experiments confirm that the proposed model operates more stably and efficiently than the   ConvLSTM algorithms.Figure 5 and Table 8 show the results of the different models for the entire links in terms of R 2 , RMSE, and STD.In all circumstances, the proposed model outperformed the other models in the testing datasets, implying that the proposed model can better predict the traffic congestion index.
As seen in Table 8, the proposed model achieved the highest prediction performance with an average R 2 of 0.972, followed by the ANN (0.862), ConvLSTM (0.824), and RNN (0.737).This means that the proposed model explained 97.2% of the variance in the dependent variable.Similarly, the proposed model  (12:00-14:00), and evening peak hours (18:00-20:00).The average TCI of the morning peak hours (0.41) was higher than that of the afternoon (0.30) and evening (0.24) peak hours.According to Table 1, the traffic congestion status during the morning peak hours (mild congestion) was less than that during the afternoon peak hours (moderate congestion) and evening peak hours (heavy congestion).Thus, the traffic congestion status during the morning peak hours was better than that of the afternoon and evening peak hours for the entire research area.
To visualize the time-series changes in the traffic congestion status for the entire research area at 5-minute intervals, Figure 8 shows the relationships between the entire links, time, and the TCI, where the points are color-coded based on the TCI.The points closer to the top denote low traffic congestion, whereas those closer to the bottom denote heavy traffic congestion.During the morning peak hours, almost all the points were closer to the top than in the afternoon and evening peak hours.Therefore, the traffic congestion status during the morning peak hours is better than that during the afternoon and evening peak hours.

Testing dataset performance
The testing dataset is applied to confirm the performance of the proposed model.Figure 6 shows a comparison between the true and predicted values for a specific link in one day.As seen in Figure 6a, the predicted values were close to the true values with a small RMSE of 0.0528.In Figure 6b, the scatter plot is almost linear with an R 2 of 0.98104.This means that the proposed model explained 98.104% of the variance in the predicted value.Therefore, the performance of the prediction model was a good fit for the learning algorithm for the testing datasets.Thus, the CNN model with the average pooling and ADAM algorithm is suitable for predicting the traffic congestion index.

Time-series changes in the traffic congestion index
Considering the urban network, Figure 7 and Table 9 show the average TCI values for the study area in one day at different time periods.A single day was divided into three time periods: morning peak hours (07:00-09:00), afternoon peak hours models.In the best-case scenario, the R 2 value of the proposed model was 98.7%.Furthermore, based on the heat map of the time-series changes in the traffic congestion status, the traffic congestion status for the entire links during the morning peak hours was better than that during the afternoon and evening peak hours.In addition, unexpected events, such as accidents, lane blockage, can create new traffic Consequently, the experimental results show that the hybrid CNN model that uses the average pooling and ADAM algorithm significantly improves the prediction performance of the traffic congestion index.In addition, the proposed model achieved the highest prediction accuracy and the lowest prediction error in comparison with the other models, such as the OLS, KNN, RF, ANN, RNN, and ConvLSTM   congestion in the off-peak hours and extend the delay time for traffic congestion.In this study, we did not analyze the impact of unexpected events.However, a month of data included unexpected events as well.In our future work, we will consider the impact of unexpected events to guarantee traffic congestion accuracy.

CONCLUSION
In summary, the proposed hybrid CNN model that uses the average pooling and ADAM algorithm showed superior performance compared to other algorithms, namely the OLS, KNN, RF, ANN, RNN, and ConvLSTM algorithms, in terms of the R 2 and RMSE values.The main contributions of this work include two characteristics, namely (a) the proposed hybrid approach, which can predict with high accuracy the traffic congestion index in urban networks using GPS probe vehicles; and (b) the time-series changes in the traffic congestion status could be reliably predicted by adopting the hybrid CNN model that uses the average pooling and ADAM algorithm.Furthermore, this study presents positive effects in choosing the optimal route for vehicles.In future studies, we will consider comprehensive input data (e.g.weather, accident, lane blockage, traffic volume, and road geometric) and involve more realistic predictions by expanding the research area.

Figure 2 -
Figure 2 -CNN architecture in the context of the traffic congestion status

Figure 4 -
Figure 4 -Results of the hybrid CNN model

Figure 5 -
Figure 5 -Results of the different models

Figure 6 -
Figure 6 -Comparison between the true and predicted values for one specific link

Figure 8 -Figure 7 -
Figure 8 -Heat map of the traffic congestion index pattern for the entire links in one day

Table 1 -
Traffic congestion index (Source: HCM2010) Figure 2 presents the CNN architecture in the context of the traffic congestion status, including the input data, feature extraction, prediction, and regression output.

Table 2 -
Raw data obtained using GPS probe vehicles

Table 3 -
Raw data based on the QGIS

Table 4 -
Time-series sequence of the input dataset

Table 5 -
Set of hyperparameters of the hybrid CNN model

Table 6 -
Results of the hybrid CNN model for the training and validation datasets

Table 7 -
Results of the hybrid CNN model for the training and validation datasets * indicates the best result a) R 2 vs. road links b) RMSE vs. road links

Table 8 -
Results of the different models for the testing datasets * indicates the best result