TRAFFIC FLOW PREDICTION USING MI ALGORITHM AND CONSIDERING NOISY AND DATA LOSS CONDITIONS : AN APPLICATION TO MINNESOTA TRAFFIC FLOW PREDICTION

Traffic flow forecasting is useful for controlling traffic flow, traffic lights, and travel times. This study uses a multilayer perceptron neural network and the mutual information (MI) technique to forecast traffic flow and compares the prediction results with conventional traffic flow forecasting methods. The MI method is used to calculate the interdependency of historical traffic data and future traffic flow. In numerical case studies, the proposed traffic flow forecasting method was tested against data loss, changes in weather conditions, traffic congestion, and accidents. The outcomes were highly acceptable for all cases and showed the robustness of the proposed flow forecasting method.


INTRODUCTION 1.1 Statement of the problem
In many countries, transportation problems and traffic jams are social challenges that increasingly demand national resources.Slow traffic on urban and rural highways increases air pollution and fuel consumption.An intelligent transportation system (ITS) is an advanced application that provides innovative services for different modes of traffic management.The term ITS was first introduced as an umbrella term to cover all technologies in information technology, communications, and control [1].A solution to prevent traffic congestion using ITS is one that predicts traffic parameters such as traffic flow, speed, and density.

Literature review
Traffic flow is a macroscopic feature of traffic and is a real-time, completely non-linear, high-dimensional and non-stationary stochastic process [2].Traffic flow forecasting is mainly classified into long-term and short-term prediction.In short-term prediction, traffic flow is forecast in the immediate future (typically 5 to 30 min.) on the basis of real-time online or historical data.Traffic conditions may vary from one moment to another in response to changing weather conditions, road accidents, cultural or political occasions and events, the types of vehicles, and driver characteristics [3].In the present study, if traffic flow is not seriously hampered by these conditions and congestion has not occurred, traffic conditions are considered to be normal.Traffic flow on a normal day is similar to that on other days without sudden congestion.On an abnormal day, sudden congestion occurs as the result of an accident, heavy precipitation, or storms.In the past few years, simple and complex methods have been proposed for predicting traffic flow under different traffic conditions.Experimental and data-driven prediction methods of traffic flow can be divided into parametric, non-parametric, and hybrid methods, each with its own advantages and disadvantages.Most research on traffic flow prediction has been implemented under normal traffic conditions and research has seldom addressed special traffic conditions such as climate, the existence of noise in the data, and disturbances on the highways [5].
Parametric prediction methods: Predictive parametric methods are modelled using time series of recorded data that predict the immediate future in steps.One advantage of these methods is their low prediction error.The major disadvantage is their frequently poor performance in the presence of noise and disturbance.Linear regression is a parametric prediction method that can predict the next variable online using real data.Sun et al. [5] have shown that this method depends on the rate at which data are recorded to predict the speed of traffic.Another parametric method is the historical mean average model.It shows poor performance in unpredictable circumstances of traffic because of its high dependence on recorded data [6].The maximum likelihood (ML) model is robust for sensor failures and rapid change in conditions [7].Despite the benefits of the exponential smoothing method for predicting traffic flow, it is very difficult to determine constant convergence for the model during major changes in traffic flow [8].The simplicity and strong potential of time series models for online operation makes these models popular for most traffic predictions.One disadvantage of the time series model is determination of the degrees for the auto-regressive (AR) and moving average (MA) models for designing an accurate forecasting model.The second disadvantage is their high dependence on input data.Incomplete or inaccurate input data produce an inaccurate time series model, resulting in an incorrect prediction [8].Time series models used to predict traffic flow include the auto-regressive integrated moving average (ARIMA) [9], seasonal ARIMA [10], vector ARIMA [11], and ARIMA with EXtra (ARIMAX) [12].
Non-parametric prediction methods: Non-parametric prediction methods predict traffic flow in proportion to road conditions and are enhanced using modern models rather than classic models.The complexity of these models and their strong dependence on large volumes of data are major disadvantages of these methods.The most popular non-parametric methods are neural networks, such as the multilayer perceptron (MLP), radial basis function (RBF) and time delay neural network (TDNN) [13].The biggest flaw in neural networks is the type of training required and their need to handle large volumes of data to train the network weights.Fuzzy [14], k-nearest neighbour (KNN) [15], support vector machine [16][17], Bayesian networks [18] and wavelet [19] are other parametric methods used to predict traffic flow.
Hybrid prediction methods: Each of the above models responds well individually only under specific traffic conditions; when the conditions change, their efficiency decreases for forecasting traffic flow.In recent years, researchers have used linear and non-linear hybrids of parametric and non-parametric methods to increase traffic flow prediction accuracy.The accuracy of hybrid methods depends on the type of parametric and non-parametric methods used; however, the computational complexity and developmental costs of hybrid methods exceed the individual parametric and non-parametric methods [8].Some of these methods are combinations of a neural network model and models such as the genetic algorithm (GA) [20], fuzzy [3], wavelet [4] and ARIMA [21] models.Although the hybrid methods have high accuracy for predicting traffic flow, their dependency on the type and volume of recorded data is a disadvantage.Their use requires the selection of good data from input data.

Contribution and structure of the paper
A general well-defined, robust, highly efficient and comprehensive method has yet to be developed to forecast traffic flow and deliver an accurate response for all aspects of traffic [8].The high volume of data combined with inaccurate and noisy data is a major disadvantage of these three prediction methods.The present study proposes a new forecasting method that tolerates changes in traffic and road congestion.The proposed method can detect inaccurate, noisy and faulty input traffic flow data and uses a newly-developed data selection method to extract data so that only the most informative data are used.
Most studies compare the quality of a new prediction method with previous models such as ARIMA and a neural network model as parametric and non-parametric methods, respectively [1].The historical mean average model is a time series model with fixed and equal weights, ARIMA is an accurate and useful time series model, and MLP as a popular neural network model; these models were selected for comparison with the proposed model.All of these models have been widely used in previous studies because of their accurate results.
Section two presents the MI algorithm and the best procedures for input feature selection.Section three discusses the data and the methods used for forecasting.Section four presents the results of the simulations and analyses of traffic data.Section five presents concluding remarks.

INPUT DATA SELECTION
An appropriate method for decreasing prediction error covariance is to decrease the dependence on observation error.Mutual information (MI) theory quantifies dependence on such error.It is similar to the cross-correlation method used to calculate the nonlinear correlation between two quantities [22,23]; MI calculates the non-linear correlation between two nonlinear quantities.Input feature selection is an important aspect of data classification.The mutual information feature selection (MIFS) algorithm calculates the MI between the input data and the best data selected for prediction to decrease the computation load and increase computation accuracy.This makes the MI a very reasonable method for forecasting the traffic flow [24].

Estimation of MI using KNN method
In control systems, MI is used to measure nonlinear interdependence of two random variables.The amount of MI determines the amount of information on random variable X that was obtained from random variable Y and is denoted as ; I X Y ^h.The calculation of MI helps to decrease the uncertainty of X while Y exists.MI is defined in Equation 1 in discrete form [25].

;
, , log I X Y P x y P x P y ,

P x y
XY ^h represents the probability density function for X and Y random variables.To calculate ; I X Y ^h, the , P x y XY ^h function must be known; however, this function is unknown here and therefore must be estimated.Methods used to estimate , P x y XY ^h include the Bayesian, ML, wavelet and KNN.Kraskov et al. [25] selected the KNN to design an estimator for the probability density function and estimate the amount of MI on the basis of the observed random data.MI estimation using the KNN method is based on Kraskov et al. as [25]: x W^h represent the gamma function and the digamma function, respectively; K is a fixed positive integer used to calculate the distance of the K th nearest neighbour; n is the maximum number of pieces of data; and ni x and n i y are the numbers of sample points located in the neighbourhood of Xi and Yi , respectively.
The most important aspect of the KNN estimator is determination of the value of K.Because there are different values of K and estimates for ; I X Y ^h, when K is a small number, the estimate of ; I X Y ^h will have low bias but large variance.When K is a large number, it will have large bias but low variance [26].For selecting the best value of K, a good rule of thumb is K n = [27].As explained later for forecast Method E, the maximum amount of data for simulations is about n 38 = .This means that the best value for K is six.

MIFS algorithm
In real systems, a significant degree of uncertainty occurs in identification system output from the inadequacy of the initial data or suboptimal conditions in the system.In this study, inadequate initial data in the traffic systems were ruled out, thus, uncertainty can be attributed to the sensing system or noise pollution.Selecting appropriate data from a large pool of data is a working solution for this problem that can be accomplished using MI.
An algorithm must be used to select the best input data as an optimal subset of initial candidate input data.Battiti's MIFS algorithm is notable for the selection of efficient inputs.In the MIFS algorithm, the aim is to obtain a relationship between the inputs and the output to decrease the existing redundancy in the input data and at the same time select the data with the best relationship for the output.The goal is to select a subset of m classes from an input set with n data classes ( m n < ) having the highest level of relationship with the input set [28].For a large value of m, computational complexity increases; for a small value of m, the accuracy decreases.A good rule of thumb for the number of m outcomes is / m n 4 = .Here, the maximum amount of data in the simulations was n 38 = , so m 10 = was considered [29].Suppose that T is the output set, S is the empty set, and li is the distinct input class that belongs to the n-member The aim is to obtain the MI amount for each input li member of L and output T; ; I l T i ^h.The lj input for ( j n 1 # # that maximizes ; I l T i ^h) is selected and separated from set L and added to set S as the first input selection ( s lj 1 = ).The ; I l s i 1 ^h is computed for all pairs of variables , l s i 1 ^h with an li member of L and ( i j ! ).The li input that maximizes the subsequent term is selected then separated from the set L and added into set S [28]. ; This is repeated until all variables of S are selected and ; I l s i m ^h with li member of L and ( i j ! ) and sm member of S is computed to maximize the subsequent term; in each repetition, m 10 = .
The important parameter in the MIFS algorithm is b , which shows the augmentation between inputs.If 0 b = , the algorithm looks for inputs with the greatest correlation only with the output, making augmentation between the inputs redundant.As b increases, the augmentation between inputs decreases gradually but the correlation between them and the output gradually lose effect.Neither a large nor a small value is appropriate for b .The main challenge to using the MIFS algorithm is finding the value of b that is different for different non-linear systems.For data classification purposes, .0 5 1 # # b was the most appropriate.A series of numbers from 0.5 to 10, in increments of 0.5 was generated for b .The training and testing data (80% and 20% of input data, respectively) and the MIFS algorithm were used to predict the traffic flow using the averaging method.The best value ( .0 6 b = ) was obtained for minimum prediction error with using the training and testing data.

TRAFFIC FLOW FORECASTING DATA AND METHODS
The aim of forecasting is to survey the performance of the MIFS algorithm in the presence of different data types, especially normal data, incomplete data (failure detectors) and noisy data (occurrence of accidents, rain or snow and heavy traffic).Since current data is inexact, using the MIFS algorithm allows the detection or omission of the false data and will increase the forecasting accuracy.Simulations using MATLAB software were carried out to determine the performance of the MIFS algorithm when choosing pre-eminent data to decrease the amount of data and increase the accuracy of predictions.

Data source
The data were collected from sensors installed on the highway network of the metro area in Minnesota, USA, and were provided by the Transportation Data Research Laboratory [30].The traffic data were taken from three detectors at station 286 located on Highway I-394, as shown in Figure 1.The traffic flow data used for forecasting were collected during the first six months of 2012 at 15 min.intervals in real time.About 96 pieces of data were collected every day.Data for holidays and working days were separated and data for non-holiday days were used for simulations and forecasting.
No data were saved on 2 January (Monday) and on 19 April (Thursday) because of technical failure and malfunctioning detectors; all data from those days equalled zero.On 29 February (Wednesday) heavy snow resulted in heavy traffic.On 1 June (Friday), 3 April (Tuesday), 7 March (Wednesday) and 16 March (Friday) heavy traffic and congestion occurred; congestion on the last two days was likely the result of an accident on the road.This analysis is based on the data specified in Table 1.

Prediction models
Mean average model: The first model is simply the average of all input data used to predict the next step ahead (subsequent 15 min.interval).
ARIMA model: This is a common regression model intended for obtaining the relationship between past and future data.Traffic flow has an erratic variance and is a non-static process that can be modelled using the ARIMA time series.The mathematical model of ARIMA(p,d,q) consists of polynomials AR and MA as shown in Equation 6 [32].ARIMA(2,2,0) uses data from the past and the time series of traffic, as shown in Equation 8: where B is the delay factor and is defined as BX X t t 1 = -; d is the difference and is defined as the degree of difference and is defined as t f is the error rate at time t and is considered to be white noise.Also, B z^h is a polynomial of AR; B i^h is a polynomial expression of MA; and p and q are the degrees of the polynomials, respectively.
MLP model: This neural network predictive model is composed of input, intermediate and output layers.The size of the data set selected by the MIFS algorithm chosen was ten, thus, the number of input layer neurons is assumed to be ten.The number of hidden layer neurons of the neural network equals the number in the input layer (ten), and the number of output layer neurons for simplicity is set at one.The training algorithm used for the neural network is the Levenberg- W and Z are the input and intermediate weighting matrices, respectively, and function .
f^h is the neuron transfer function.

Forecasting methods
Five forecasting methods were employed in each simulation to compare the predictive power of the models and demonstrate the performance of the MIFS algorithm.
Method A: Mean average model and all data from previous days: All data and the average of all data from previous days are used to predict traffic flow of the last day of each simulation.For example, average data traffic flow at about 6 p.m. for the last 22 non-holiday days is used to predict traffic flow at 6 p.m. of the last day of the month.There are usually 22 × 96 pieces of data, but this may vary in different simulations.
Method B: ARIMA model and data from the previous four hours: Only the data from the previous four hours is used to predict traffic flow for the last day in each simulation.About 80% of data is used to identify the regression relations using ARIMA(2,2,0) and 20% is used to predict traffic flow.To predict traffic flow on the last day of the month, at least four days of data are required.Using four pieces of data for each one hour means that only the last 16 pieces of data (one per 15-minute data interval) are used to predict traffic flow at t.This totals about 16×96 pieces of data for all simulations.For example, to predict traffic flow at 6 p.m. on the last day of the month using ARIMA(2,2,0), all traffic flow data related to the previous four hours (2 p.m. to 6 p.m.) are required.
Method C: Mean average model and data from the same days of the week: This method is used to forecast traffic flow on the last day of each simulation with the selected data.It uses the mean of all data from the same day of each week.For example, traffic flow is predicted for 6 p.m. of the last Wednesday of the month by averaging traffic flow data at 6 p.m. of the four previous Wednesdays.There are 4×96 pieces of data used in this model.

Method D: Mean average model using MIFS algorithm and all data from previous days with the previ-
X (10) ... ...

Figure Three-layer MLP predictive model 2 -
ous four hours: Data from Methods A and B are used to forecast traffic flow on the last day of each simulation.In offline mode using the MIFS algorithm, a set of ten optimal pieces of data from among all input data is averaged.For example, to predict traffic flow at 6 p.m. on the last day of the month, the input traffic flow data for the previous four hours plus traffic flow data at 6 p.m. for the previous 22 days is used.Using the MIFS algorithm, the mean of ten optimized pieces of data extracted from all input data is calculated.There are about 38×96 pieces of data (22×96 for Method A and 16×96 for Method B), and these are different for each simulation.
Method E: MLP using MIFS algorithm and all data from previous days with the previous four hours: This method is similar to Method D but uses the MLP neural network instead of a mean averaging model.MIFS algorithm outputs require that a set of ten optimal pieces of data be selected for the input layer of the MLP model and traffic flow is predicted in the output layer.The volume of training and testing data is about 85% and 15% of the 10×96 pieces of data selected.

Numerical comparison
To evaluate the efficiency and accuracy of the forecasting methods and to numerically evaluate the MIFS algorithm, the prediction error is obtained using mean absolute error (MAE), mean absolute percentage error (MAPE) and variance absolute percentage error (VAPE).Mathematical models for all three error criteria are shown below.

SIMULATIONS AND RESULTS
Five simulations were used to show the effect of the MIFS algorithm in decreasing computational complexity and increasing the accuracy of ITS and traffic flow forecasting.The proposed method with noise and lost sensor data was also analysed under different traffic conditions.

Scenario 1: Forecasting a normal day using incomplete data
Traffic flow was predicted for a normal day using a data set containing incomplete data (failure of detectors) using the five methods and MIFS algorithm.The simulated data for January 2012 are used to predict traffic flow on 31 January (Tuesday) using data from the most recent non-holiday days of the month (21 days).On 2 January, detector failure occurred and no data were available (zero information).The simulation results are shown in Table 2 and Figure 3(a).
Method A contains incomplete data.When compared with Method C, Method A shows obvious bias and the impact of detector failure in its prediction of traffic flow.Method C adequately removed this effect; however, if the detector failure had occurred on the day of the week that provides data for Method C, it would have worsened the bias instead of improving it.The ARIMA model did not experience incomplete data, but used lag data for prediction, which means it did not perform well.Generally, when data are selected in Methods C, D and E using whole data, the results are good.Methods D and E produce more accurate predictions than Method C because the MIFS algorithm selects data with stronger logic.As expected, the MIFS algorithm decreased the amount of input data and increased the accuracy of forecasting.Method E most accurately tracked and decreased the prediction error, which indicates the high power of neural network in forecasting.

Scenario 2: Forecasting a normal day using incomplete and noisy data
Traffic flow for a normal day was predicted using an incomplete data set with noisy data (congestion because of an accident).The data were for April 2012 and the goal was to predict the traffic flow on 26 April (Thursday) using data from 18 non-holiday days from the month of April.On 19 April, detector failure occurred and no data were available (zero information).Heavy traffic congestion (accident) occurred on 3 April.The simulation results are shown in Table 3 and Figure

3(b).
As seen, the results of Scenario 2 are similar to those of Scenario 1; however, incomplete data were included in Method C, so its forecast was worse than that for Method A. Unlike Methods A, B and C, Methods E and D easily detected noisy data and set them aside with the use of the MIFS algorithm and even selected the ten best pieces of data.The MIFS algorithm clearly identified noisy, incomplete and lag data (four previous hours).For this simulation, the best results were achieved by Method E.

Scenario 3: Forecasting a normal day using a variety of noisy data
Forecasting normal traffic flow for one day was done using a data set containing a variety of noisy data to test the performance of the five methods and the MIFS algorithm under emergency conditions.Data collection occurred during the month from 28 February (Tuesday) to 28 March (Wednesday).The goal was to predict traffic flow on 28 March (Wednesday) using data from the previous month (21 non-holiday days).On 29 February (Wednesday), traffic was very heavy as a result of heavy precipitation and the data were similar to that for a holiday.Severe congestion was also recorded on 7 March (Wednesday) and 16 March (Friday).The simulation results are shown in Table 4 and Figure 4(a).
The results of this scenario are similar to Scenario 2. Methods A and C were affected by the noisy data and were biased and did not provide accurate predictions.Method B used lag data (four previous hours) and should have performed well under normal conditions, however, under abnormal conditions; ARIMA could not predict traffic flow.MIFS-based Methods D and E identified and removed the erroneous delay-ridden and noisy information.This simulation shows that the MIFS algorithm clearly identified and tracked the real data, despite noisy, defective and abnormal data.In this simulation, the best prediction was made again by Method E.

Scenario 4: Forecasting a normal day using a large volume of different data types
The aim of this scenario was to predict traffic flow for a normal day using a large volume of different data types that included noisy and incomplete data caused by precipitation, accidents, and detector failure.The difference between this scenario and the previous scenarios are the large volume of data and   As seen, the results are similar to the results of Scenario 3. The large amount of input data should have been sufficient for most methods to predict the traffic flow.The amount of data for Method A was 128×96 data, for Method B was 16×96 data, for Method C was 25×96 data, and for Methods D and E was 10×96.The most accurate data should have been that for Method A, which had the largest amount input data; however, the results show that Methods C, D and E selected data more accurately.Methods D and E used the MIFS algorithm with ten pieces of data and succeeded in increasing forecasting speed and decreasing computational complexity with considerable accuracy.This simulation demonstrates that the MIFS algorithm is sensitive to and can identify noisy, incomplete and lag data.The MLP neural network performed better than the mean average and ARIMA models.Method E again provided the most accurate results.

Scenario 5: Forecasting an abnormal day using a large volume of different data types
This scenario forecasted traffic flow of an abnormal day using a variety of data.The aim was to test forecasting performance of the models, forecasting methods and the MIFS algorithm under adverse conditions.The traffic flow was forecast for 1 June (Friday) using normal, incomplete and noisy data.The initial forecast was based on data from the previous month (1 May to 31 May; 22 days).The simulation was then repeated using data for the three previous months (1 March to 31 May; 79 days).Further evaluation was made using data from the five previous months (1 January to 31 May; 108 days).The results of these simulations are shown in Table 6 and Figure 5.
The increase in volume of input data allowed all methods to predict traffic flow well.The 16×96 lag data used to forecast traffic flow for 1 June (Friday) in Method B produced the same result for all three scenarios.It can be seen in Table 6 that Method E used the MIFS algorithm and MLP model in all three scenarios to achieve the best performance.Increasing the amount of input data had no effect on prediction.It was expected that increasing the accurate data would increase the system precision; however, Method E with the MIFS algorithm and ten pieces of data produced the same result as did Method A using a greater amount of data.This confirms that the MIFS algorithm manifests excellent accuracy with decreased complexity that can identify noisy, incomplete and different data.Method E provided the best predictions for all three versions of this scenario using the power of the combined MIFS-MLP.

CONCLUSION
The present study describes the calculation of non-linear dependence of recorded traffic flow data determining inaccurate and missing data.Then the best data were selected from the large amount of data provided to decrease the volume of calculations and improve the forecasting accuracy.
Numerical case studies from traffic flow data for Minnesota interstate Highway I-394 were used.The data contained incomplete data (failure of detectors), noisy data (rain, snow, congestion or accidents) and lag data (from the previous four hours) to test the forecasting methods under different traffic conditions.
The simulations results showed good robustness for the combined MIFS-MLP neural network model.The numerical results also indicated that the proposed MIFS-MLP flow predictor decreased forecasting error over the results of the mean average and ARIMA models that used traffic data from previous time periods to

S
. Hadi Hosseini et al.: Traffic Flow Prediction Using MI Algorithm and Considering Noisy and Data Loss Conditions...

S
. Hadi Hosseini et al.: Traffic Flow Prediction Using MI Algorithm and Considering Noisy and Data Loss Conditions... Marquardt method, the fastest back propagation algorithm in the MATLAB NN-toolbox.MLP neural network is first modelled separately using the training data and then forecasts using the test data.About 85% of data selected by the MIFS algorithm are used to train the network and the remaining 15% are used to test each simulation.The MLP predictive model is shown in Figure 2.

Table 1 -
Traffic congestion on specific days in first six months of 2012[30, 31]

Table 2 -
Traffic flow prediction error for Scenario 1 .Hadi Hosseini et al.: Traffic Flow Prediction Using MI Algorithm and Considering Noisy and Data Loss Conditions... S

Table 3 -
Traffic flow prediction error for Scenario 2

Table 4 -
Traffic flow prediction error for Scenario 3