a hYBRID MODEl BaSED ON SUppORT VECTOR MaChINE fOR BUS TRaVEl-TIME pREDICTION

Effective bus travel time prediction is essential in transit operation system. An improved support vector machine (SVM) is applied in this paper to predict bus travel time and then the efficiency of the improved SVM is checked. The im - proved SVM is the combination of traditional SVM, Grubbs’ test method and an adaptive algorithm for bus travel-time prediction. Since error data exists in the collected data, Grubbs’ test method is used for removing outliers from input data before applying the traditional SVM model. Besides, to decrease the influence of the historical data in different stages on the forecast result of the traditional SVM, an adaptive algorithm is adopted to dynamically decrease the forecast error. Finally, the proposed approach is tested with the data of No. 232 bus route in Shenyang. The results show that the improved SVM has good prediction accuracy and practicality.


INTRODUCTION
With the development of public transportation system, the adoption of technologies such as bus tracking, location and communication has become popular.These technologies make it available to predict bus arrival time at stops and help to provide reliable service in public transport.The passengers can enable efficient scheduling of their trips using the available accurate bus travel time prediction to avoid waiting times.

S. Zhong et al.:
A Hybrid Model based on Support Vector Machine for Bus Travel-Time Prediction (stop-skipping strategy, holding strategy etc.) to adjust bus operation when the bus operation is interrupted using accurate bus travel times.
Usually, variable travel times on links and dwell times at stops make arrival time of transit vehicles at stops in urban transit networks uncertain.Especially when buses are disrupted by stochastic factors, such as traffic accidents, traffic jams, it is a tough task to predict the bus travel time.The past decades have witnessed an increasing passion for the research of bus running time prediction.The literature focuses on time series [1] artificial neural network or support vector machine (SVM) [2][3][4][5][6][7][8] and Kalman filtering techniques [9,10], etc.
The time series mainly rely on the similarity of the historical data and future data.The prediction result will apparently deviate with the variation of the historical data.Besides, the application of time series leads to an obvious lag.The artificial neural network is good at dealing with non-linear planning problems.It has been used for prediction of bus travel time in the recent decades.However, considering the structure confirming of neural network algorithm and local convergence problems, the artificial neural network is not widely used [11,12].Kalman filtering technique can be accommodated with irregular variation.It is effective in single-step prediction but lacks in accuracy in multi-step prediction [13] .
Recently, SVM has attracted researchers' attention as a new machine learning method.SVM was developed by Vapnik [14,15], which is characterized by a specific type of learning algorithms.It has been successfully applied to solve some classic problems, such as incident detection [16], traffic-pattern recognition [17], passenger head recognition [18], and travel time prediction [7,8,[19][20][21].SVM is used to find regular pattern and makes use of them to analyse the unknowns.Based on collected statistics, SVM is available in small samples.As SVM has a strong capacity for learning, it is easy to make balance on data fitting and data generalization.
With the successful application of SVM, this paper proposes a bus travel time prediction model based on SVM.To predict travel times of links in an accurate and timely manner, it is essential to take the traffic conditions, e.g.traffic congestion, etc., into consideration.Since traffic congestions are complicated and difficult to measure, the speed of preceding/current bus on links is used to consider traffic conditions of links [7].Three kinds of information, route link, velocity, the arriving time at a stop of former previous bus are input data to deal with some unexpected delays.
As the operation of a bus is sensitive to current traffic conditions at the downstream part of the used road network, Grubbs' test method is applied to eliminate the abnormal data from input data to avoid the deviation of results brought by unreasonable data.Many procedures to discard outliers have been proposed, such as the Grubbs' test [22,23], the Tietjen-Moore test, and the Generalized Extreme Studentized Deviate (ESD) test [24].In a consecutive procedure like the one by Grubbs, only one observation at a time can be tested as an outlier.In the presence of multiple outliers, these procedures must be used repeatedly until no further outliers can be detected.Tietjen-Moore test is a generalization of the Grubbs' test to the case of more than one outlier.It has the limitation that the number of outliers must be specified exactly.ESD procedure is the recursive version of Grubbs' test method.It requires only an upper bound on the suspected number of the outliers.Grubbs' test is usually used to detect outliers in small samples where the number of outliers is unknown.
SVM can obtain several computing techniques, such as optimization theory, kernel functions, etc.With these auxiliary tools, some problems like small samples, non-linearity, high dimension and optimal in part, can be solved effectively [18,[25][26][27][28].However, there will inevitably come to quadratic programming when the standard SVM progresses the optimization problem, especially with large training data and training online.To solve the problems of large calculations and low real-time conducted by the proposed factors, an adaptive algorithm is put forward to minimize the prediction error.
The adaptive algorithm can improve the training speed of the SVM method to a certain extent.In addition, there remains the advantage of simpliness.Moreover, the adaptive algorithm overcomes the problem of lacking of an appropriate support vector in a standard SVM [8,29].Yao et al. [30] applied the adaptive algorithm to adjust the performance of SVM method.They confirmed that the adaptive algorithm is effective in the SVM improving programs.Based on equality constraint, Suykens and Vandewalle [31] proposed the least square support vector machines (LS-SVM).Using the Sheman-Morrison-Woodewalle equation, Chua [32] applied SVM in large samples.However, outcome data lacked stability.Depending on the LS-SVM, Yang et al. [25] proposed the adaptive algorithm.In their study, the approach can determine the number of support vectors.
Unlike previous studies, this paper focuses on bus travel-time prediction problem with an improved support vector machine.To make a better prediction performance, the proposed approach is combined with Grubbs' test method, SVM and an adaptive algorithm.However, there are also some limitations in this study.For example, the bus travel-time prediction of the target bus is less accurate in a specific link with increased bus traffic.
This paper is further organized as follows.Section 2 provides a brief introduction to improved SVM and explains the application of the Grubbs' test method in data processing.Section 3 presents time prediction process with SVM and an adaptive algorithm to adjust the prediction results.Section 4 reveals results of a numerical test and analysis including performance evaluation of the proposed methodology.Lastly, the conclusions are stated in Section 5.

MODEl DEVElOpMENTS
In this paper, the improved SVM model, called SVMAA, is combined with a standard SVM model, Grubbs' test method and adaptive filtering method.The structure of the modelling process is presented in Figure 1.First of all, the needed data are collected with GPS technology and investigation.As not all of the data are effective in processing, the outliers among the samples are removed using the Grubbs' test.Then, the effective data set is divided into three parts chronologically: training samples, validation samples and test samples, of which the ratio is 70%, 20% and 10%, respectively.The training samples and validation samples are employed to optimize the parameters of the SVM model, while the test samples are applied to verify the prediction performance of the model.Since not all the results of SVM testing are appropriate, the adaptive filtering method is adopted to select the performance with large absolute error.The detail performance will be introduced in the following sections.

Outliers detection and removal based on Grubbs' test method
In this paper, the predicted bus travel time is predicted under the ordinary circumstance.Therefore, it is the key point to build a universal rule in the collected data set for bus running time prediction.However, considering the unexpected factors which happen in public transport in some circumstances, such as traffic jams, bad weather, traffic accidents, etc., it will inevitably contain unexpected information, called outliers, which are beyond certain standard deviations (SD) from their respective means.As outliers in the data set may make a disturbance to the prediction accuracy, it is necessary to discard the outliers from the input data of SVM to avoid a disproportionate influence on data analysis although such outliers are sometimes important to the rational result, e.g.traffic accident identification.
Generally, the collected data can be divided into two categories, predictable and stochastic data.The predictable data are generated in ordinary circumstances as the input data of SVM in the bus travel-time prediction.The stochastic data are usually caused by artificial reasons and can be removed by the Grubbs' test methods.It always results in a small amount of special numbers when the data sets are collected in the wrong way.
Grubbs' test is based on the assumption of normality.That is, one should first verify that the data can be reasonably approximated by a normal distribution before applying the Grubbs' test.Grubbs' test detects one outlier at a time.This outlier is expunged from the dataset and the test is iterated until no outliers are detected.However, multiple iterations change the probabilities of detection, and the test should not be used for sample sizes of six or less since it frequently tags most of the points as outliers.
Moreover, when there is an accident in the route, the collected data of the traffic vehicle must be abnormal.The congestion caused by an accident will disrupt the normal operation of the accident bus and its following buses subsequently.That situation is thought as another kind of congestion.Thus, the interference data should be removed in the data set.Since Grubbs' test is simple and easy to apply, it is used to remove the outliers from the data set, assuming the distribution of bus speed on the predicted link is subjected to normal distribution during the studied periods.
Grubbs' test is defined for the hypotheses: H0: There are no outliers in the data set; Ha: There is at least one outlier in the data set.
The Grubbs' test statistics is defined in Ram [33]: where Y and g denote the sample mean and standard deviation, respectively.The Grubbs' test statistics is the largest absolute deviation from the sample mean in units of the sample standard deviation.This is the two-sided version of the test.The Grubbs' test can be also defined as a one-sided test.To test whether the minimum value is an outlier, the test statistics is: where Y min denotes the minimum value.To test whether the maximum value is an outlier, the test statistics is: where Y max denotes the maximum value.For the two-sided test, the hypothesis of no outliers is rejected at significance level a if: where , denotes the upper critical value of the t-distribution with N−2 degrees of freedom and a significance level of a/(2N).For the one-sided tests, replace a/(2N) with a/N.

Time prediction process with an adaptive algorithm
After selecting the collected data before time prediction, the SVM model is proposed to predict bus running time during the route links.As not all of the results of the SVM prediction are appropriate, the adaptive filtering method is adopted to select the performance with large absolute error.

Support Vector Machines for regression
Given a set of data, points, x y ), x i is the input vector, x X R n i ∈ ⊆ ; y i is the desired value, y Y R n i ∈ ⊆ ; n is the number of training samples.SVM can map x to a high-dimensional space H with a non-linear mapping function ƒ( ) x : where ϕ( ) x represents the high-dimensional feature spaces which are non-linearly mapped from the input space x.The coefficients ω and b are estimated by minimizing the regularized risk function: In formula (6), the first term 1 2 2 ω , called regularized term, is used to make the function as flat as possible, as well as to improve the controlling capacity.
The second term ∑ is an empirical risk function, which is defined with different loss function.To control the punishment of samples beyond the error, C is settled to satisfy the condition C>0. Figure 2 presents the performance when applying ε (insensitive loss function).

Observer Predicted values
Observer The parameters for the support vector regression It is defined as follows for ( ,ƒ( ) ( ,ƒ( )) max( ƒ( ) ,0) With the introduction of non-negative slack variables, ξ, ξ * , formula (6) can be described as: Function ( 8) is a convex quadratic optimization problem.With the introduction of Lagrange Function, the following will be obtained: Therefore, With the introduction of kernel function K x y i i ( , ) , the formula (13) can be given in the following form: Using the above mentioned kernel function, all necessary computations can be performed directly in the input space, without having to compute the map ϕ( ) x .There are some popular kernel functions, the linear kernel K x y x y , where r are the kernel parameters.

adaptive algorithm
By using the SVM model, the potential rule of bus travel is developed through historical data.Though the outliers are eliminated by the Grubbs' test before performing the SVM model, the real-time capacity of the model is weak as it is using an offline training pattern.To solve this problem, an adaptive algorithm is proposed to adjust prediction results of bus travel time dynamically.
The adaptive algorithm introduces an adaptive factor, set as g (m) g m h h . It is g h (m) that is used to adjust the prediction results of bus running time by minimizing the covariance of prediction error.After real-time estimation, iterative update g h (m) and covariance of prediction error, the optimal value g h |(m) is obtained.Let t h (m) indicate the real running time of bus m at route link h; let t h (m) * indicate the predicted running time of bus m at route link h, which is adjusted by the adaptive algorithm [34].The adaptive algorithm is given with the following equations: where g h (m) represents an adaptive factor; indicate the error of travel time prediction of bus m-1 at route link h with SVM model and an adaptive algorithm, respectively.ξ h m ( ) −1 and ξ h m ( ) * −1 indicate the covariance of error of prediction running time of bus m-1 at route link h with SVM model and an adaptive algorithm, respectively.

The hybrid model of travel-time prediction
As the prediction performance of SVM model may include some unexpected results, this paper combines the SVM model and the adaptive algorithm to predict the bus travel time.It is called the hybrid model of time prediction, denoted as SVMAA.In this model, the traditional SVM model is used to predict bus deadheading time at certain stops.The arrival time of the current bus at a certain stop is calculated according to the deadheading time and the arrival time for the former bus.Then, the outcomes of arrival time are selected with the adaptive algorithm.Lastly, the bus running time is worked out after the selection.
Among the variables that may contribute to the variation of bus travel, five input variables and an output variable are used in the SVM model.In this model d denotes the input variables, which consist of the five variables, the bus stop k (d 1 ), the past second stop k-2 (d 2 ), the departure time at stop k-2 for bus m-1 (d 3 ), the departure time at stop k-2 for the objective bus m(d 4 ), and the travel times of current segment k-2→k (d 5 ).Let O denote output vector, the bus travel times on the route segment between two adjacent points.Variables d 1 , d 2 are self-explanatory.The route segment variable identifies the section between the current stop and the next stop at which the arrival times are to be predicted.Variable d 5 refers to travel times at segment k-2→k for each former bus (bus m-1, bus m-2,…, bus m-r).Variable d 5 is expected to estimate the traffic conditions of the current segment.The latest travel times on the predicted segment will be updated after a bus finishes its travel on the predicted segment.When vehicle m reaches stop k-2, the input variables of d 5 at segment k-2→k are travel times of the former r buses at the segment, denoted by V m-1 ,…,V m-r , which are shown in Figure 4.
Using different kernel functions, one can construct different learning machines with arbitrary types of , y a After the outcome of ˆ( ) h t m , let T k m denote the arrival time of a bus m at stop k.Then, T k m can be calculated as follows: Using the adaptive algorithm, the prediction of the arrival time of bus at stop k is dynamically regulated.The progress of the adaptive algorithm is depicted in

NUMERICal TEST
The presented model for bus travel-time prediction has been tested with the data of transit route No. 232 in Shenyang City, China.The transit route goes from Ling North Street to the city center with 19 stops in total and 10.8 km per direction.In the numerical test, parts of the eastbound direction of the transit route are studied.Part of the routes and bus stops are selected to predict bus running time, which are representative stops on route No. 232, showed in Figure 7.The stops from Shenyang Station to Three Taizi are denoted as stop 1, stop 2, ..., stop 19, respectively.The length of each segment of No. 232 is described in Table 1.First, the data used in the models are described and then the results are obtained.

Data collection and processing
In this research, the bus speed is used to reflect the traffic conditions of links at the current time.However, it is difficult to measure the speeds directly.Hence, the approximate speeds are calculated using the link lengths and travel time, as a substitute for the real speed on the corresponding links.
To obtain the data of travel times of links, we conducted an on-board survey of all the No. 232 bus trips from October to November 2012.The collected data consist of the arrival times at 19 stops that are covered by the test beds in eastbound trip at peak period (6:30 A.M. -7:30 A.M., PP) and off-peak period (10:00 A.M., 11:00 A.M., OP) on weekdays.There are 994 valid trips within this approximate one-month period.After calculating the approximate speeds on links, the data sets should be scaled.In the modelling process, the data sets are linearly scaled to the range from 0 to 1 to ensure the validity of all data and avoid numerical difficulties during the calculation.Before the procedure of model validation, group the speed data chronologically, using the first five weeks as sample data, 6 th week and 7 th week as the test set and 8 th week as the forecasting set.

Model identification and results
The data can be mapped implicitly into a feature space and be enhanced quite efficiently by using a kernel function.Radial Basis Function is selected as the kernel function in this study.There are three parameters while using Radial Basis Function kernels: c, ε and γ, which are calibrated by grid-search.In grid-search, all pairs of c, ε and γ are tried and the one with the best performance is picked up.For the bus arrival time prediction problem, the three parameters are selected as (2 -2 , 2 -5 and 1.47).The process of the proposed prediction model is shown in Figure 7.This paper takes the average of absolute prediction error (MAPE) to test the accuracy of prediction results.MAPE can be described as follows: where J is the number of test sample.t d (j) and  t j d ( ) stand for the actual travel time and predicted travel time of sample j, respectively.
The predicted arrival time at the six bus stops of bus No. 232 in a weekday is shown in Figure 8.The predicted performances of the off-peak and peak time are compared in Figure 8 as well.The performance in Figure 8 reflects that MAPE of most stops are below 10%, which means the proposed approach has good prediction accuracy.It is interesting to find that the MAPE of predicted performance is decreasing from the first research stop (the north Shenyang station) to the last one (Songling cultural).The stop with large prediction error is the north Shenyang station.The main reason for the above problem is that the north Shenyang station is the second stop of the east-bound bus route.The prediction of bus travel time at the north Shenyang station is lacking in input data of the bus running time on the former route link, which results in a lower prediction accuracy.
However, a larger MAPE is shown in Figure 8 at the peak time prediction.The average MAPE in the peak time is about 10.7%, while MAPE at off-peak time is 9.5%.Some bus stops with large passenger interchanges, such as the second province hospital and the Experimental Middle School, have large MAPE.

MAPE (100%)
The North ShenYang station The second province hospital The fourth hospital Experiental Middle School Xin Le Dormitory SongLingCultural Palace This is because the use of the speed of previous buses at the second province hospital segment and the fourth hospital segment cannot effectively describe the current traffic condition since there is traffic congestion during the peak time.However, the average MAPE is still around 10%, which means that the proposed model can be used for bus travel-time prediction during peak time.
To prove the superiority of the proposed approach (SVMAA), we compared the predicted bus travel time (at the off-peak time) with the prediction results offered by other three models: time series model, neural network model and SVM model.Methods and parameter settings of the three models have been adopted in literature.After computing the bus running time of each objected stop with the four different models, the average prediction errors (MAPE) of each model are shown in Figure 9.  Figure 9 describes that MAPE of SVMAA model in five stops is lower than the other three models except for the Shenyang station as lack of information.Taking account of the historical data and without the real-time information of a running bus, MAPE of predicted results of time series model and neural network model are the largest.Both of the SVM model and SVMAA model have better prediction accuracy as they are based on structural adventure and the VC dimensionality theory which is a measure of the capacity of a statistical classification algorithm [35].SVM can decrease the predicted errors with good capacity of sam-ples.Compared with SVM, the performance of SVMAA is better with the introduction of Grubbs' test method and an adaptive algorithm.The Grubbs' test method eliminates the outliers so as to obtain a better input.Finally, an adaptive algorithm will adjust the predicted result with the real-time information of the corresponding bus travel time.

CONClUSIONS
It is hard to predict the bus running time accurately as the condition of bus operation is full of stochastic events.For the use of SVM a specific function is not needed, since SVM can reflect the relationship in non-linear and the real-time system of input and performance.With a good learning capacity, the SVM is suitable to predict bus travel time.Considering the real-time bus travel-time prediction, the Grubbs' test method is used to select the outliers.Besides, an adaptive algorithm is applied to real-time adjustment of the test result of prediction.Lastly, this paper takes the data of bus route No. 232 in Shenyang to test the proposed approach.The test consequence shows that SVMAA can predict bus travel time efficiently.In addition, to simulate road traffic conditions, the data of bus velocity in the test bus route are selected; it will give rise to a great prediction error for the bus with lower frequency.In the future research, the data of bus velocity of buses on multi-routes will be adopted to simulate road traffic conditions.

MAPE (%)
The North ShenYang station The second province hospital The fourth hospital Experiental Middle School Xin Le Dormitory SongLingCultural Palace

Figure 1 -
Figure 1 -Structure of the hybrid model

d 3 and d 4 − 2 1 .
are denoted as T respectively.The deadheading time which is departure time interval between bus m and bus m-1 from stop k-2 can be pre-It varies with different bus stops.Assume that there are stops numbered from the origin terminal through the destination terminal.

Figure 3
describes variables d 3 and d 4 in more details.

S
. Zhong et al.: A Hybrid Model based on Support Vector Machine for Bus Travel-Time Prediction decision surfaces.Let ˆ( ) h t m express the prediction deadheading time of bus m and bus m-1 at stop k.

Figure 5
shows the progress of ˆ( ) h t m prediction with the SVM model.

Figure 7 -
Figure 7 -Sketch scheme of test route No. 232

Figure 8 -
Figure 8 -Prediction results of bus arrival time at main stops

Figure 9 -
Figure 9 -Comparison of forecasting the bus travel time for four models

Table 1 -
Configuration of transit route No.232