SHORT-TERM TRAFFIC FLOW PREDICTION METHOD IN BAYESIAN NETWORKS BASED ON QUANTILE REGRESSION

821 ABSTRACT With the popularization of intelligent transportation system and Internet of vehicles, the traffic flow data on the urban road network can be more easily obtained in large quantities. This provides data support for shortterm traffic flow prediction based on real-time data. Of all the challenges and difficulties faced in the research of short-term traffic flow prediction, this paper intends to address two: one is the difficulty of short-term traffic flow prediction caused by spatiotemporal correlation of traffic flow changes between upstream and downstream intersections; the other is the influence of deviation of traffic flow caused by abnormal conditions on short-term traffic flow prediction. This paper proposes a Bayesian network short-term traffic flow prediction method based on quantile regression. By this method the trouble caused by spatiotemporal correlation of traffic flow prediction could be effectively and efficiently solved. At the same time, the prediction of traffic flow change under abnormal conditions has higher accuracy.


INTRODUCTION
Traffic jam is a common problem of urban traffic. Advanced and efficient traffic signal control strategy is a low-cost and efficacious way to alleviate traffic congestion. Short-term prediction of traffic flow is the key and foundation of traffic signal timing optimization, which can make traffic signal control more active and play a role in easing the traffic congestion. Therefore, the study of shortterm traffic flow prediction has important practical significance and application prospect.
Short-time traffic flow prediction is a real-time online traffic flow prediction method. It mainly collects the real-time traffic flow data in the urban road network to infer the changes of traffic flow in a future period of time. When the short-term traffic flow prediction time is less than 10 minutes [1], the prediction accuracy is higher.
At present, there are three major methods of short-term traffic flow prediction: parametric method, non-parametric method, and mixed method. The frequently used parametric methods include Autoregressive Integrated Moving Average (ARI-MA) model [2], Vector Auto-Regression model (VAR) [3], Kalman Filtering Technique (KFT) [4], Extended Kalman Filtering (EKF) [5] and local linear regression [6], etc. The most commonly used non-parametric methods include non-parametric regression [7], Enhanced K-Nearest Neighbour (K-NN) algorithm [8], Least Squares Support Vector Machine (RL-LSSVM) regression [9], support vector regression [10], Artificial Neural Network (ANN) [11], Artificial Particle Swarm Optimization (APSO) [12], Artificial neural networks under heterogeneous conditions [13], and K-nearest neighbour method [14], etc. Mixed prediction method is a method that combines two different methods to improve the accuracy of traffic flow prediction, for example, a hybrid of the exponential smoothing and the Kalman filtering [15], a hybrid method of neural networks and extended Kalman filtering [16], Linear combination method [17], a hybrid method of Autoregressive Integral Moving Average (ARIMA) and Multilayer Artificial Neural Network (MLANN) [18], A hybrid method of ARIMA and Support Vector Machine (SVM) [19] etc.
The advantage of parameter prediction method is low prediction error, but the disadvantage is that it focuses on the normal data, ignores the extreme data, and does not have good adaptability to the traffic flow prediction under abnormal conditions. The advantages of non-parametric prediction method are its stability and portability in prediction [20], while et al. [25], used the mixed wavelet packet method to remove the noise in the traffic flow. However, this is not a good solution because once the traffic flow changes, caused by abnormal conditions which occur in practical application, the existing traffic flow prediction system will impair the accuracy, which will indirectly lead to the failure of the traffic control system.
In dealing with the above two difficulties, a new prediction method is proposed to predict the short-term traffic flow through a combination of quantile regression and Bayesian network calculation and reasoning methods. Based on the probabilistic inference and the characteristics of traffic flow spatiotemporal correlation, the method builds a mathematical model of spatiotemporal correlation of traffic flow by analysing the traffic flow spatiotemporal correlation of all concerned intersections and constructing the whole traffic control area into a Bayesian network. The difficulty of spatiotemporal correlation of traffic flow prediction is expected to be solved in this way. The probability space distribution of traffic flow under abnormal conditions is also analysed by quantile regression method at the same time. Then, the quantile regression method is used to determine the parameters in the Bayesian network, which enables the prediction model to have better adaptability to the prediction of traffic flow changes caused by abnormal traffic conditions.
The method proposed in this paper not only considers the characteristics of traffic flow spatiotemporal correlation, but also has good prediction accuracy when the traffic flow in the road network changes abnormally. Meanwhile, this prediction method is not sensitive to partial data loss. If traffic flow prediction is carried out in the case of partial loss of real-time traffic flow data, although the accuracy of prediction will be compromised to some extent, a relatively accurate prediction result can still be obtained.

Traffic flow spatiotemporal correlation analysis
This section takes two adjacent intersections as an example to analyse the spatiotemporal correlation of the traffic flow. Vehicles on the urban road network make a choice of road when they pass a its main disadvantage is that the modern models it uses are complex and strongly dependent on large amounts of data. The hybrid prediction method has higher computational efficiency than either single method and has the same advantages as the traditional method with its disadvantage that it is highly dependent on the type and capacity of the recorded data [21].
Recently, scholars have studied how to improve the accuracy of traffic flow prediction with a mixed prediction method. According to the current studies, the factors that affect the accuracy of short-term traffic flow prediction mainly focus on two aspects: 1) The real-time change of traffic flow is affected by the changes in time and space, and the characteristics of traffic flow change in time and space are non-linear. Therefore, the spatiotemporal correlation of traffic flow should be taken into consideration in the prediction, which undoubtedly increases the difficulty of traffic flow prediction. 2) In most cases, traffic flow presents the characteristics of periodic change in time, while a few abnormal conditions (such as traffic accidents, road maintenance, holidays, etc.) will make the change of traffic flow deviate from its periodic change. Such deviation of traffic flow changes caused by abnormal conditions will interfere with the accuracy of most traffic flow prediction methods, which, in turn, makes the prediction methods based on periodic changes of traffic flow inaccurate. At present, almost all short-term traffic flow prediction methods are implemented on the basis of periodic changes of the traffic flow. For these two difficulties, scholars in this field also have a lot of targeted studies.
For the first difficulty, Zhu Z. et al. [22], in order to reflect better the spatiotemporal correlation of traffic flow changes in short-term traffic flow prediction, incorporated the information of vehicle travel speed into the process of traffic flow prediction. Xu YY et al. [23] proposed a Support Vector Regression (vs-SVR) model based on the selection of spatiotemporal variables.
For the second difficulty, most scholars believe that the deviation of traffic flow periodic change caused by abnormal conditions is noise data, which should be deleted or filtered by a filtering algorithm. Xie YH et al. [24] used discrete wavelet decomposition to remove noise in the traffic flow. Jiang XM intersection. At the same time, there is a time gap between the upstream intersection and the downstream intersection in the urban road network. In the urban road network, the traffic flow at the target outlet at any intersection has the following relationship to its upstream intersection (note: the road outlet where the traffic flow needs to be predicted in this paper is called "target outlet" ): , , In Formula 3, X t+Δt is the traffic flow prediction value at the target outlet of the road at time t+Δt; t is the current moment; Δt is the travel time between the target outlet and the upstream outlet; Y i,t is the real-time traffic flow at upstream outlet i at time t. i=1,2,…,n. It is worth noting that all parameters in Equation 3 are affected by spatiotemporal changes, that is, on the same day their values change dynamically at different intersections and at different times.
According to Equation 3, the traffic flow predicted value of any target outlet can be calculated by its real-time traffic flow of the upstream outlet. Therefore, the traffic flow change at any single outlet is related to the traffic flow change at all outlets in the whole road network, which reflects a strong spatiotemporal traffic flow change correlation.
In Formula 3, when the real-time traffic flow at the upstream outlets is used as the input value to predict the traffic flow at the target outlet, the predicted time interval Δt is set, that is, the travel time from the upstream outlet to the target outlet is fixed.
As the length of each road in the urban road network varies, the predicted time interval Δt determined based on the travel time will lead to multiple different Δt values in the whole network. Therefore, value Δt set in this paper is the travel time of the shortest distance between two adjacent road outlets in the whole traffic control area. If the travel time is longer than the predicted time interval Δt in the traffic control area, the predicted time interval Δt must be adjusted to be the same. As shown in In the formula, the difference compensation of traffic flow collection period at each intersection is calculated, which is represented by C l . In the shortest traffic time section C l =0 in the formula l can be any traffic section; t l is the average travel time of section l; Δt is the predicted time interval of the traffic control area. certain intersection. The road chosen by these vehicles will directly affect the change of traffic flow at the crossroads downstream ( Figure 1).
In Equations 1 and 2, a 1 -a 4 and b 1 -b 4 are the traffic flows at each outlet of A and B, respectively; β n is the turning probability of vehicles leaving from outlets A 1 , A 2 , A 4 and turning to the next intersection, also known as the correlation coefficient of traffic flow between upstream and downstream intersections, n=1,2,3, respectively, indicating that vehicles choose the directions of straight, left and right turns; γ is the constant revision value, which is the sum of the number of vehicles at the end of this intersection, the number of vehicles at the beginning of this intersection and the number of vehicles queuing to pass intersection B in the intersection of A and B connected.

Short-term traffic flow prediction on spatiotemporal correlation
In the urban road network, the vehicles departing from the adjacent upstream intersection directly affect the changes of the traffic flow at the downstream residuals to estimate the parameters [26], with the features as follows: first, multiple regression curves of the selected quantile points can be simultaneously fitted, which enables us to have a more comprehensive understanding of the distribution of explained variables in the probability space. Second, the fitting results are more stable for the random disturbance of a few specific data in the sample data. Third, quantile regression model has good elasticity and better asymptotic property in big data environment.
Bayesian network is a probabilistic network. It is one of the most effective theoretical models in the field of uncertain knowledge expression and inference. Bayesian networks are based on a probabilistic graph model of directed acyclic graphs [27]. Each node in the Bayesian network is an independent event, and the probability relationship between each independent event is transmitted in the network along with the directed acyclic graph. Bayesian networks provide a natural way to express causal information in each independent event [28].
In this paper, the short-time prediction of the traffic flow at the target outlet is implemented mainly through the Bayesian network, which is realized based on the Bayesian formula, which is a method to calculate the posterior probability based on the prior probability and conditional probability. Therefore, the prior probability and conditional probability are the key parameters of the short-time traffic flow prediction of the Bayesian network. The prior probability is calculated by the quantile regression, while the conditional probability by calculating the spatiotemporal correlation between the intersections. At the same time, the method also uses quantile regression method to estimate the maximum likelihood of the probability of different traffic flow conditions, so that the change of traffic flow under If the traffic flow value of the target outlet at time t+Δt-C l is to be predicted, then the traffic flow value Y i,t at time t-C l needs to be input. If the traffic flow value at target outlet at time t+n·(Δt-C l ) is to be predicted, then the traffic flow value of the upstream outlet i of the target outlet at time t+(n-1)·(Δt-C l ) needs to be predicted and the value result of the prediction needs to be input into Formula 3. Therefore, the traffic flow volume at target outlet at time t+n·(Δt-C l ) can be predicted by Formula 3. Through such iteration, the prediction results of traffic flow at each upstream outlet are input into the prediction model for the prediction, and the predicted value of traffic flow changes in the far future of the current target outlet can be obtained.
In the traffic control area, some target outlets at the road network edge nodes have upstream outlets, while some other target outlets do not have upstream outlets. For a target outlet without an upstream outlet, it is impossible to predict its traffic flow only through Formula 3. The solution in this case is to predict the single target outlet traffic flow if there is no traffic flow from upstream outlet to the target outlet. The predicted reference data are the historical traffic flow data of the target outlet itself. The prediction method is to use the quantile regression method to analyse the probability distribution of the historical traffic flow data of the target outlet, and use Equations 5 and 6 in Section 3 to predict the current traffic flow at the target outlet. Meanwhile, the prediction value of single outlet traffic flow based on quantile regression is also the prior probability required to establish the Bayesian network. The calculation method is the same as the one expounded in Section 3.

METHODOLOGY
Quantile regression is to estimate the whole model by using several quantile functions. It uses the method of weighted sum of absolute values of predicted value at the outlet i in the No. τ quantile; ρ τ is the loss function of the quantile point; y i is the historical traffic flow of the outlet i; ӯ i stands for the average of traffic outlet i in each time period; ρ τ (y i -ӯ i ) is the weighted error of traffic flow historical data; u=y i -ӯ i ; i is the indicator function for u (u<0), representing a logical relation. It shows that when u≥0,ρ τ (u)=τu; when u<0, then ρ τ (u)=(τ-1)u. When τ=0.5, y(τ) can be approximate to historical average traffic flow. According to the mathematical significance of the quantile regression, y(τ) can be interpreted as this; in the historical traffic flow data of outlet i, if we take any value y i , then: when y i >y(τ), its probability is P(y i >y(τ))=1-τ; when y i <y(τ), the probability is P(y i <y(τ))=τ. By using this property, multiple quantile curves can be used to describe the distribution of the traffic flow variation in the probability space under different conditions. Figure 3 shows the probability space of the traffic flow distribution in the sector area. The dividing line in the left figure is the quantile (median) curve at the point of digit τ=0.5. The whole traffic flow probability space is divided into two parts in the probability density by the quantile (median) curve τ=0.5. In the sample data, the probability of any traffic flow sample value y i larger or smaller than y(0.5) is 50%. If we continue to divide the probability space with more multiple quantile curves, we can know the probability y i between each two quantile curves more accurately. Therefore, using multiple quantile curves to divide the probability space can describe the probability of traffic flow changes caused by small probability events more clearly in the probability space.
In Figure 3 on the right, the whole traffic flow probability space was divided into eight equal parts in the probability density by τ=(0.125; 0.25; 0.375; 0.5; 0.625; 0.75; 0.875), the seven quantile curves. If each of these two equal parts is taken as a region in the probability space, then the probability space abnormal conditions can be preserved as a small probability event in the probability space, so as to improve the accuracy of the prediction.

Analysis of traffic flow under abnormal conditions
The essence of any prediction is actually the maximum likelihood estimate, and so is the traffic flow prediction. The change of traffic flow under abnormal conditions refers to the change of traffic flow caused by accidental or temporary special events, which include natural occurrences as bad weather such as storms and hails, and man-made situations, subjectively or objectively, such social gatherings as demonstrations, parades and marathons or traffic accidents and so on. The change of traffic flow under abnormal conditions is a small probability event. In order to make the traffic flow prediction under abnormal conditions more accurate, the results of small probability events must be reserved in the prediction process. In this paper, the quantile regression method is applied to analyse the change of traffic flow with diverse probabilities. The findings show the method can well observe the probability density distribution space of the whole traffic flow change, and can make the traffic flow change under abnormal conditions become a small probability event and be reserved in the probability space.
Next, quantile regression method is used to analyse the traffic flow changes at a junction. According to the basic idea of quantile, the sum of the absolute value of weighted error reaches the minimum. Hence: min y y y  Figure 3 -Segmentation of traffic flow probability space by median curve and quantile curve network. But for the traffic flow prediction of a single intersection, the network structure of the Bayesian network is fixed. Figure 4 is a Bayesian network structure of a standard crossroad:

Figure 4 -Structure diagram of Bayesian network at a single crossroad
In Figure 4, X is the predicted traffic flow value of the target outlet after future time Δt, and Y 1 , Y 2 and Y 3 are the relevant historical or real-time traffic flow data of the upstream of the target outlet. For each target outlet, the upstream outlet to which it is connected has a network relationship structure similar to that shown in Figure 4. However, for the traffic control area composed of multiple intersections, its structure will be more complex, which is caused by the superposition of the upstream outlets of multiple intersections. As shown in Figure 5.
(3) (4) Figure 5 -Traffic control area in the Bayesian network structure diagram Figure 5 illustrates the structure diagram of Bayesian of several traffic control areas. Due to the actual complex and changeable traffic network structure, it is difficult to describe the relationship between intersections with the unified Bayesian network structure. Although the structure of multi-intersection Bayesian networks is complex and changeable, its essence is still composed of multiple single-intersection has been equally divided into four regions by three quantile curves τ=(0.25; 0.5; 0.75). τ=(0.125; 0.375; 0.625; 0.875), the four quantile curves can just represent the local mean curves of these four regions, respectively. That is, of all the sample data, any arbitrary y i value in τ=(0.125; 0.375; 0.625; 0.875), the four quantile curves, the probability is 25%, i.e. P(y(0.25))=P(y(0.5))=P(y(0.75))=25%. Similarly, the whole traffic flow probability space can figure out n equi-difference quantile curves; n is odd.
Therefore, in n equal difference quantile curves, the whole traffic flow probability space is divided into (n-1)/2 partitions by the curve with even sequence number. Then, also in these n equal difference quantile curves, the curves with odd sequence number are taken as the regional mean value after the probability space of the traffic flow is partitioned.
In this way, by using multiple quantile curves, we can clearly observe the probability density distribution of the probability space of the entire traffic flow. When the number of partitions is large enough, the proportion of the probability area of local traffic flow to the probability space of the whole traffic flow can be used as the probability of sample ontogenesis in the local area. Therefore, we can infer the probability of the value of any point in the probability space of the whole traffic flow appearing in reality. In this way, the traffic flow changes under abnormal conditions can be reserved in the form of small probability events. When n is large enough, the regional mean probability values in the probability space of the traffic flow can be used to approximately show the occurrence probability of regional values in the probability space of traffic flow, namely: The value of P(y(τ)) is the proportion of the regional area with quantile τ to the whole traffic flow probability space; therefore, probability P(y i ) of occurrence of the traffic flow at outlet i can be calculated from Equation 7 when the traffic flow y i is any value.

Determine the network structure of the Bayesian network
The network structure of the Bayesian network in the traffic control area of urban road network depends on the network topology of the traffic According to Formula 8, the expression of a single-intersection Bayesian network shown in Figure 4 can be written as: In Formula 9, prior probability P(X i,t+Δt ) is obtained through quantile regression method to analyse the distribution of probability density space of the historical sample data of the target outlet, and then the probability region corresponding to value X i,t+Δt is found. Finally, it can be obtained through The value of conditional probability P(X i,t+Δt |Y 1,t-c1 ,Y 2,t-c2 ,Y 3,t-c3 ) needs to be determined by Equation 3. It can be seen from Equation 3 that value X i,t+Δt of the traffic flow X i,t+Δt at the target outlet is related to the traffic flow Y 1,t-c1 ,Y 2,t-c2 ,Y 3,t-c3 at the upstream intersection and the value γ t+Δt , β 1t , β 2t , β 3t , of dynamic parameters. When all the relevant parameters are determined, the traffic flow X i,t+Δt at the target outlet has a unique definite solution. In practice, value Y 1,t-c1 ,Y 2,t-c2 ,Y 3,t-c3 of traffic flow at the upstream outlets can be obtained in two ways: first, it can be obtained according to the real-time data collection of the upstream outlets; second, the prior probability distribution of the value of traffic flow at the upstream outlets can be calculated based on the historical traffic flow data of the upstream outlets. In addition, it can be seen from Equation 3 that when X i,t+Δt ,Y 1,t-c1 ,Y 2,t-c2 ,Y 3,t-c3 all are known, only a certain amount of historical traffic flow data of the whole intersection all outlets and their upstream outlets are needed to solve the multiple quantile regression equation of Equation 3, and value γ t+Δt , β 1t , β 2t , β 3t of dynamic parameters can be figured out. At present, the algorithms to solve the multiple quantile regression mainly include the simplex method, interior point method, smoothing algorithm, etc. [28]. According to the characteristics of traffic flow data, this paper chooses smoothing algorithm to solve the dynamic parameters.
During the verification of the actual data, it is found that values γ t+Δt , β 1t , β 2t , β 3t of dynamic parameters at the same time and at the same intersection will show a Gaussian distribution through the calculation results of different historical data. In the process of traffic flow prediction, dynamic parameters γ t+Δt , β 1t , β 2t , β 3t have practical significance such as the turning probability, vehicle arrival rate and Bayesian networks. Therefore, as long as multiple single-intersection Bayesian networks as shown in Figure 4 are established and combined with the actual road network situation, the Bayesian network structure diagram of any traffic area can be established. Figure 6 is a multi-intersection Bayesian network structure diagram composed of two single-intersection Bayesian networks.

Determine the parameters of Bayesian network
The Bayesian network is a probabilistic network, and the Bayesian formula is the basis of this probabilistic network. The Bayesian formula is the inverse derivation of the full probability formula. It is a method to obtain the posterior probability of the occurrence of an event in view of the prior probability and conditional probability of the occurrence of the event known in an event. Its expression is: To establish the traffic flow prediction model based on the Bayesian network, in addition to constructing the network structure, two parameters of prior probability P(B i ) and conditional probability P(A|B i ) must be determined first.
In the traffic flow prediction, the prior probability is based on the historical traffic flow data of the target outlet, and the estimated traffic flow probability distribution value of the target outlet in a certain period is calculated. Conditional probability is the probability distribution when the traffic flow of the target outlet after time Δt is at a certain value under the condition that the real-time or historical traffic flow data of the upstream outlet are at a certain value. predicted time interval. According to the historical traffic flow data, the quantile curves of each quantile point will affect the prediction accuracy of the prediction result. The more quantile curves used to calculate the prior probability and conditional probability, the higher the prediction accuracy will be. Otherwise, the prediction accuracy will be correspondingly compromised. Meanwhile, according to the spatiotemporal characteristics of the traffic flow variation and the characteristics of the Bayesian network, for example, the traffic flow prediction results of the Bayesian network after time Δt can be re-input into the Bayesian network, and the traffic flow prediction results after time 2Δt can be obtained. The traffic flow prediction results can be obtained by this analogy after time nΔt. But the longer the time interval between predictions, the less accurate they become.

Bayesian network traffic flow prediction process
The traffic flow prediction of the Bayesian network is carried out in the traffic control sub-area, and its workflow is as follows:   Figure 7 is the flowchart of the traffic flow prediction. Its working principle is: first, according to the traffic network topological structure to determine vehicle queue length. Therefore, in theory, its value is influenced by the changes of the traffic flow under abnormal situation. In order to retain the change caused by this influence, the quantile regression is also needed to calculate the prior probability distribution of dynamic parameters γ t+Δt , β 1t , β 2t , β 3t In this way, the value of conditional probability can be obtained as shown in Formula 10.
The condition for the establishment of Equation 10 is that all the parameters in the equation must satisfy the establishment of Equation 3. In the actual prediction, value Y 1,t-c1 ,Y 2,t-c2 ,Y 3,t-c3 of the traffic flow at the upstream outlets is the known real-time data under normal conditions, then P(Y i,t )=1.
After the parameters of the Bayesian network are determined, the single-intersection Bayesian network as shown in Figure 4 can be established. Then, a single-intersection Bayesian network is established for each outlet in the traffic control area. Finally, the traffic control area Bayesian network as shown in Figure 5 is established through the combination of multiple single-junction Bayesian networks.

Factors affecting traffic flow prediction accuracy of Bayesian network
Bayesian network prediction model can make inferential prediction of traffic flow changes after the future time Δt according to real-time traffic flow data in the traffic control area. According to the characteristics of the Bayesian network, not all real-time traffic flow data in the traffic control area must be known when traffic flow prediction is carried out. What only needs to be known is the real-time traffic flow data of some road outlets to predict the traffic flow changes of each road outlet in the whole traffic control area. However, given the number of road outlets in real-time traffic flow data, the accuracy of traffic flow prediction will be affected. The more known road outlets of real-time traffic flow data, the higher the accuracy of prediction, otherwise the accuracy of prediction will be correspondingly compromised.
The accuracy of the Bayesian network prediction is related not just to the number of known outlets of real-time traffic flow data, but also to the number of quantile curves and the length of the days from 7 th to 20 th May 2018. Each 10-minute traffic volume from 7:30 a.m. to 9:20 a.m. for all vehicle classes was manually extracted using the collected video data, whose unit is veh/h.

Empirical verification
In the empirical verification, the change of morning peak traffic flow on workdays is regarded as the change rule of traffic flow under normal conditions, and the change of morning peak traffic flow on resting days is regarded as the small probability deviation event of traffic flow change. The traffic flow interval is the number of vehicles passing in the past 10 minutes. Only one direction of the traffic flow proceeding from Sangong Road towards Jianshe Road No.1 was considered for the analysis.
The prediction of the traffic flow at outlet B 1 of Jianshe Road 1 is taken as an example to verify the prediction accuracy of the above methods. Target outlets: Jianshe Road 1 outlet B 1 ; its upstream three outlets: Sangong Road outlet A 1 , outlet A 2 , outlet A 4 .
the Bayesian network structure. Then, based on the historical traffic flow data, quantile regression method is adopted to calculate the required parameters of the Bayesian network, and the traffic control area Bayesian network model is constructed accordingly. Then, according to the real-time traffic flow data, the Bayesian network is used to predict the traffic flow changes after the future time Δt. At the same time, the parameters required by the Bayesian network are calculated and updated according to the real-time traffic flow data.

CASE ANALYSIS
In this paper, traffic flow data of 14 consecutive days at two adjacent intersections where Heping Avenue intersects the Sangong Road and Jianshe Road 1 in Qingshan district of Wuhan city during the morning peak period (7:30 a.m.~9:30 a.m.) are taken as the basic training data of the Bayesian network. Then take the traffic flow data of the adjacent intersection at the 4-day morning peak as the verification data to verify the above traffic flow prediction method. In the 4-day validation data, the validation data on day 1 and day 2 are the morning peak traffic flow data on workdays, while the validation data on day 3 and day 4 are morning peak traffic flow data on Saturday and Sunday, respectively. The positions of the two adjacent intersections are the same as shown in Figure 8, outlet A is the intersection of Sangong road, and outlet B is the intersection of Jianshe Road 1.
The traffic flow data to be used as verification were collected at two intersections using video survey from 7:   conditions, and these abnormal changes can be better recognized, when quantile regression is used to calculate the prior probability.
In the same way, the prior probability of traffic flow changes at the upstream outlet A 1 , A 2 and A 4 can be calculated.
Then, according to Formula 10, the conditional probability of traffic flow variation between outlet B 1 and three upstream outlets is calculated, and the Bayesian network is established accordingly. The inference network is used to predict the real-time traffic flow of the outlet B 1 .

Analysis and contrast of verification results
The verification method is as follows: the known 4-day real-time traffic flow data of the three upstream outlets A 1 , A 2 and A 4 are input into the Bayesian network to obtain the traffic flow prediction data after time Δt at outlet B 1 .
Compare the predicted data with the measured data of intersection B 1 , and use Root Mean Square Error (RMSE) and Mean Absolute Percentage Error (MAPE) as the standard to measure the prediction accuracy of this method. The formula for calculating RMSE and MAPE is as follows: In Formulas 11 and 12, ŷ i ={ŷ 1 ,ŷ 2 ,…,ŷ n } is the predicted value, y i ={y 1 ,y 2 ,…,y n } is the measured value. The verification is in the following diagram; the data from day 1 and day 2 refer to the morning rush hour traffic flow data on workdays, and data from day 3 and day 4 are the morning rush hour traffic flow data on resting days. 1) When the real-time traffic flow data of the three upstream outlets A 1 , A 2 and A 4 of target outlet B 1 are all unknown, the comparison between the traffic flow after the prediction time Δt of the Bayesian network and the actual traffic flow is shown in Table 1. 2) When the real-time traffic flow data of only one of the three upstream outlets A 1 , A 2 and A 4 of target outlet B 1 is known, the prediction traffic flow after time Δt of the Bayesian network is compared with the actual traffic flow, as shown in Table 2.
Firstly, according to the historical traffic flow data of outlet B 1 for two weeks, the distribution of the prior probability of traffic flow change at this outlet is calculated by using quantile regression algorithm. The result is shown in Figure 9. It can be seen from Figure 9 that the trend of Curve (1) is significantly different from that of Curves (2), (3) and (4) from 7:30 a.m. to 8:50 a.m. Between 8:50 a.m. and 9:20 a.m., the four curves become closer. According to the observation of the collection time of actual sample data, this difference is caused by the variation of traffic flow during the morning rush hour on workdays and weekends. The sample data near Curve (1) in the figure are mostly morning peak traffic flow data on resting days, while the sample data near Curves (2), (3) and (4) are mostly morning peak traffic flow data on workdays. This shows that if the morning peak traffic flow data on resting days are regarded as the traffic flow data under abnormal known, MAPE for the traffic flow on the workdays are 5.14% and 5.04%, 4.76% and 9.60% for the resting days. When all the three upstream outlets A 1 , A 2 , A 4 are known, the MAPE workdays are 3.86% and 4.55%, 3.42% and 3.16% for the resting days, respectively.
It can be seen that with the increase in the number of known real-time traffic flow upstream outlets, the accuracy of the Bayesian network in predicting traffic flow is also continuously improved. This improvement is particularly significant for the traffic 3) When the real-time traffic flow data of the three upstream outlets A 1 , A 2 and A 4 of target outlet B 1 are all known, the prediction traffic flow after time Δt of the Bayesian network is compared with the actual traffic flow, as shown in Table 3.
As can be seen from Tables 1-3, when the traffic flow data of the three upstream outlets A 1 , A 2 , A 4 are all unknown, MAPE for the traffic flow on workdays are 9.12% and 8.48%, while on the resting days it is 22.65% and 41.46%. When only one upstream outlet A 1 or A 2 or A 4 traffic flow data is This paper adopts the Bayesian inference network, a probabilistic prediction. The prediction principle is to take the real-time traffic flow data of the upstream intersection as the conditional probability to predict the traffic flow at the downstream intersection. The prediction results of the traffic flow at each intersection will be affected by the traffic flow conditions at the upstream intersection. Therefore, it is only necessary to put the historical traffic flow data under different conditions into the same training data set to train the required parameters for the establishment of the Bayesian network, so as to adapt the prediction model to various traffic flow anomalies. Table 4 comes from Xu YY et al. [23] listing several prediction results. Table 5 is the prediction result from the model proposed in this paper.

Comparison of predicted results
According to the quality standard from the American Federal Highway Administration (FHWA), the maximum acceptable prediction error is 20%; 10% should be an ideal error.
From this perspective, the predicted results from AR, MARS, SVR and ST-BMARS were a little smaller than the maximum acceptable error and the result from SARIMA was larger than the maximum acceptable error, which means the SARIMA predic-flow prediction with small probability deviation. At the same time, when the real-time traffic flow of all upstream outlets is known, this method shows high prediction accuracy for the traffic flow of all time periods.

COMPARISON OF THE METHODS
Xu YY et al. [23] applied VS-SVR to predict the traffic flow and compared their predicted results with those from the five common methods AR, MARS, SVR, SARIMA and ST-BMARS. Hereby in this paper we analyse and compare their prediction accuracy between our method and the methods of Xu YY et al. [23].

Comparison of traffic flow data
Xu YY et al. [23] collected their traffic flow data from urban roads vehicle volume in every 10-minute interval by some loop detectors whose unit is veh/h. In order to avoid the interference of non-workdays traffic flow data with workdays traffic flow data, the data collection only focuses on traffic flow on workdays.
This paper also collected traffic flow data at some intersections in urban roads in every 10 minutes with the same unit veh/h, but they put the data from workdays and non-workdays in the same data aggregation without extra classification or filtering. The reason is as follows: On the basis of considering the spatiotemporal variation relationship of traffic flow, this method combines the characteristics of quantile regression and the Bayesian network, and has a good ability to distinguish and identify specific data, such as: it can better identify different changes of traffic flow in the morning and evening rush hours on resting days and workdays.
This method has a high flexibility, and can improve the accuracy of traffic flow prediction by setting more quantile points of the quantile regression, as long as the set quantile points can equally divide the whole sample space.
At the same time, this method allows a small number of upstream outlets, to lose real-time traffic flow data in the prediction process. When the real-time traffic data of all intersections in the road network are missing, the prediction accuracy of this method on workdays is better than that of traditional traffic flow forecasting methods, and the prediction accuracy of resting days is worse than that of traditional forecasting methods. When each more real-time traffic flow data in the road network are known, the prediction accuracy will be improved on the existing basis. When more traffic flow data are known, the accuracy of prediction will be obviously improved. When traffic flow data are missing, the mean value of historical traffic flow data can be used to replace real-time traffic flow data, but the prediction accuracy will be compromised to some extent. tion accuracy is unacceptable. In Table 4, the accuracy of VS-SVR predicted result is the best, 11.4%, a little higher than the ideal value.
In Table 5, when all the data are unknown, the traffic flow prediction error value on workdays is 8.98%, reaching the ideal standard, but the traffic flow prediction error value on rest day is 32.06%, far from the maximum acceptable prediction error, so its prediction accuracy is completely unacceptable. But when more and more real-time traffic flow data are known, the prediction accuracy proposed in this paper under two traffic circumstances will be greatly improved correspondingly. And when all the traffic flow real-time data are known, the prediction accuracy error proposed in this paper is obviously smaller than that proposed by Xu YY et al. [23] and also smaller than ideal prediction error.

CONCLUSION
The Bayesian network traffic flow short-term prediction method based on quantile regression is a real-time online prediction method. It can increase the accuracy of model prediction by input real-time traffic flow data. At the same time, real-time traffic flow data will be input into the sample space of historical data to update and train the Bayesian network in real time to achieve the purpose of real-time online learning. In general, this prediction method not only considers the spatiotemporal correlation of traffic flow, but also takes into account the influence of abnormal traffic conditions on traffic flow prediction, hence, solves two difficult problems in traffic flow prediction.
Due to the limited conditions, this paper establishes a single intersection Bayesian network only according to the traffic flow variation relationship between the upstream and downstream of two intersections, and verifies the traffic flow prediction with an example. The traffic flow prediction Bayesian network of the whole traffic control area is not established by combining multiple single-junction Bayesian networks according to the road network structure of the traffic control area. What we need to do next is to verify the method step by step in multiple intersections in Bayesian network in the traffic control area.

ACKNOWLEDGEMENT
The author would like to thank Drs. Qingnian Zhang for his guidance and help in this research work. This work was supported by the National Nat-