FORECASTING THE ALL-WEATHER SHORT-TERM METRO PASSENGER FLOW BASED ON SEASONAL AND NONLINEAR LSSVM

Accurate metro ridership prediction can guide passengers in efficiently selecting their departure time and simultaneously help traffic operators develop a passenger organization strategy. However, short-term passenger flow prediction needs to consider many factors, and the results of the existing models for short-term subway passenger flow forecasting are often unsatisfactory. Along this line, we propose a parallel architecture, called the seasonal and nonlinear least squares support vector machine (SN-LSSVM), to extract the periodicity and nonlinearity characteristics of passenger flow. Various forecasting models, including auto-regressive integrated moving average, long short-term memory network, and support vector machine, are employed for evaluating the performance of the proposed architecture. Moreover, we first applied the method to the Tiyu Xilu station which is the most crowded station in the Guangzhou metro. The results indicate that the proposed model can effectively make all-weather and year-round passenger flow predictions, thus contributing to the management of the station.


INTRODUCTION
In recent decades, metros with the characteristics of a large volume, high speed, low pollution, low resource consumption, low energy consumption, having a convenient and comfortable ride, and being in line with the principles of sustainable development have become the first choice for major cities to solve traffic congestion and develop public transportation [1]. Passenger demand pressure is increasing, especially in developing cities. For instance, over 50% of public transportation passenger flow assignments of Guangzhou city were taken by the Guangzhou Metro, whose passenger flow intensity is the first in the country [2]. In metro-related research, short-term ridership forecasting plays a crucial role in improving the efficiency of metro systems, which has been an important part of intelligent transportation systems, for example, motivations and benefits that include alleviating station congestion, informing travelers about traffic conditions and providing real-time traffic monitoring and management. Thus, an increasing number of studies have been conducted to address the metro ridership pressure using short-term predictions, thus improving the metro service quality.
3) CMs mainly combine two or more prediction methods. If each prediction method is combined with an effective form, the prediction sample information can be more fully utilized to achieve a higher prediction accuracy than a single prediction method -according to the excellent review paper by Bates JM et al. [20]. The CMs can be roughly classified into three categoriesmulti-features [18,19], consider periodicity [14,16,17], and real-time prediction [10]. However, it is unfortunate that there is only one small study which specifically deals with shortterm subway passenger flow prediction. Moreover, these models did not fully consider the key factors ( Table 1) of short-term metro passenger flow new developments in technology, there has been a resurgence of interest in nonlinear and combined model prediction methods. Table 1 summarizes the previous studies on short-term passenger flow prediction in recent years. The findings can be briefly summarized as follows: 1) LPs including historical average prediction methods, time series prediction methods, and Kalman filter model prediction methods, etc.
There is a general problem in LPs; namely, in the case of nonlinear and uncertain traffic flow, the performance of the model deteriorates, so it is not possible to predict events with strong randomness [10,12]. 2) NLPs including artificial neural network and support vector machine (SVM) etc., artificial neural network prediction methods, such as back propagation neural network (BPNN) [7], long short-term memory (LSTM) [11], gated recurrent units (GRU) [9], convolutional neural network (CNN) [18], are widely used in the field of intelligent transportation. Although the artificial neural network methods have a good predictive effect on the temporal data, there are some insurmountable shortcomings, like year-round passenger flow. Thus, this paper provides a way for metro companies to better manage the subway station. The remaining parts of this paper are organized as follows. Section 2 introduces detailed prediction methods proposed in this paper. Section 3 analyses the law of passenger flow at the Guangzhou Metro. Evaluation methods and extensive experiments are conducted in Section 4, and the conclusions and future work are discussed in Section 5.

Data description
The experimental dataset was extracted from the automatic fare collection (AFC) system of the Tiyu Xilu Station in Guangzhou China of 2016 and 2017. It can be seen from Figure 1 that the subway passenger flow during Spring Festival decreases significantly. The reason for this phenomenon is that more than half of the permanent residents in Guangzhou are migrants. Since the Spring Festival is the Chinese New Year, in the Spring Festival of 2017, 61.36% of people, according to statistics, returned to their hometown for the festival [21]. To eliminate the influence of the Spring Festival, passenger flow data during the Spring Festival are excluded, and 15 weeks of passenger flow data are selected for the experiment, as shown in Table 2. The type of working days indicates that the target date is a working day, as are the types of holidays 1 and holidays 2. Moreover, the passenger flow data are recorded with 60-minute intervals.
forecasting. Here, we utilize a wavelet transform method as well as SVM to reach a balance of solving problems speedily and precisely.
This article mainly aims to improve the prediction accuracy of metro passenger flow time series prediction models, whether the forecasting target is a working day or a holiday. Along this line, we propose a parallel architecture, called the seasonal and nonlinear least squares support vector machine (SN-LSSVM), which is comprised of a cycle-based least squares support vector machine (W-LSSVM) and a day-based least squares support vector machine (D-LSSVM), to extract the periodicity and nonlinearity characteristics of passenger flow, respectively. To summarize, primary contributions of this paper are as follows: Firstly, a new nonlinear hybrid prediction model called the SN-LSSVM is proposed for short-term passenger flow forecasting. Moreover, the model is not limited to metro passenger flow but can be extended to other forecasting applications.
Secondly, we evaluate the performance of the proposed prediction model through extensively comparative experiments in terms of: (a) linear prediction model auto regressive integrated moving average (ARIMA); (b) supervised learning model SVM; (c) hybrid model Wavelet-SVM, and (d) neural network model LSTM.
Thirdly, with an empirical demonstration of the proposed model using the Guangzhou Metro AFC data, we conclude that the proposed method has higher accuracy in predicting the all-weather, and   Given a t-size sequence of the observed passenger volume, define traffic data profile of one day as d j ={x 1 j , x 2 j ,...,x t j } T where t is the recording point (if the record interval is 60 minutes, t=18), and x t j is

Notation and problem statement
For the convenience of the readers, the critical notations used throughout the paper are summarized in Table 3. Table 3  -The use of the passenger flow time series collected only from one metro station, which consists of two advantages: a) we can obtain a satisfactory prediction results even without the road network data, weather data and other factors; b) based on the proposed method to consider other factors (weather and passenger flow of nearby stations), we may have a more stable prediction method.
The prediction method can be described as follows (as shown in Figure 2): Stage 1: Traffic data profiling. Extract two pieces of passenger flow data: the set of W and the set of D. Stage 2: Time series decomposition. The W is regarded as a signal sequence, which is decomposed into a group of high-frequency signals (large fluctuations in the time series curve) and a low-frequency signal (small fluctuations in the time series curve) via wavelet transform (WT). Stage 3: Predictive modelling. The LS-SVM is embedded in the framework to train the decomposed W and the D, and then the predicted high-and low-frequency signals and d j+1 are obtained. the sum of transaction count during t-th time slot. Moreover, define traffic data profile of one week as w i ={x 1 i ,x 2 i ,...,x t i } T (if the record intervals is 60 minutes, t=126). Two sets are extracted from the original time series as W={w 1 ,w 2 ,...,w i } T (i=1,2,…,m) and D={d 1 ,d 2 ,...,d j } T (j=1,2,…,n). W is a set of passenger flow, which contains the periodic characteristics of the time series. D is a set of passenger flow, which contains the nonlinear characteristics of the time series.  Judge prediction type 0 for holiday 1 for norm working day

Figure 2 -SN-LSSVM model flow chart
Db3 wavelet is one of the WTs and is used to decompose the w i , then a high-frequency signal and a low-frequency signal wH i 1 can be respectively obtained ( Figure 3a). Considering wL i 1 as a new passenger flow time series to be decomposed, the use of Db3 wavelet is continued to obtain a high-frequency signal wH i 2 and a low-frequency signal wL i 2 . By analogy, a low-frequency signal wL i q and q groups of high-frequency signals (wH i 1 ,..., wH i q ) whose frequency is essentially stable are finally obtained.
In this paper, q=7 is used, that is, after 7 steps of decomposition, 8 groups of decomposed sequences (wH i 1 ,...,wH i 7 ,wL i 7 ) are given. Among these sequences, wL i 7 characterizes feature of w i , and wH i 1 ,...,wH i 7 show the subtle fluctuation.
The number of samples of w i is l 0 , and the number of samples after the j-th decomposition is l j . After each decomposition, the length of the new signal is shortened by half, that is, The process of the WT can be expressed by the following formula: where the coefficients h 0 (k-2n) and h 1 (k-2n) are two columns of conjugate filter coefficients determined by wavelet functions, called low-pass filters and high-pass filters, respectively. Their values are determined by the db3 wavelet and the number of coefficients.

Decomposing the original passenger flow
Passenger flow data can be considered a signal sequence, whereas the WT can handle nonlinear and non-stationary data. In this paper, the WT is used to analyze and extract the characteristics of W, because W is much more complicated than D. This subsection briefly reviews the WT algorithm from Yang [3] and Sun [17], and interested readers can refer to it for detailed algorithmic descriptions and theoretical property analysis.
The WT can refine the passenger flow (signal) function through scaling and translation by using multiple scales, and finally achieve high-and low-frequency subdivisions. The LS-SVM is used to predict the high-and low-frequency information of passenger flow respectively, which greatly reduces its training and learning scale compared with direct prediction, thus avoiding problems such as slow convergence speed. 3) Repeat step 1 and step 2 to predictwH wH 1 7 R R and D.
At this stage, the high-frequency signals wH wH 1 7 R R and the low-frequency signals wL 7 Q are subjected to wavelet reconstruction to obtain a predicted passenger flow time series w i+1 , as shown in Figure 3b. The reconstruction formula of the passenger flow data is as follows: where H * and G * are the dual operators of h 0 (k-2n) and h 1 (k-2n), respectively.

The hybrid SN-LSSVM Model
As previously described, the SN-LSSVM model is constructed using a D-LSSVM model and a W-LSSVM model. The passenger flow time series d * j+1 of a certain day is obtained from the reconstructed passenger flow time series w i+1 , and the final predicted passenger flow is calculated as follows: where α 1 and α 2 are the weights of d * j+1 and d j+1 . To obtain the value of α 1 and α 2 , a function is defined to solve this optimization problem: where P is real value, P L is the predicted value, ε i 2 is the error variable, N is the mean absolute percentage error (MAPE), m is the number of experiments, and n denotes the number of the data points each day.
Furthermore, the following equation was defined: where (a 0 ,a 1 ,...,a k ) are the coefficients of the lowpass filter, and (b 0 ,b 1 ,...,b k ) are the coefficients of the high-pass filter. For a db3 wavelet, n is 6.

Predicting and reconstructing
The SVM is a technology proposed by Sain and Vapnik [22], whose basic idea is as follows: if the sample data X is nonlinearly separable, the input vector X is mapped to the high-dimensional feature space H by the nonlinear transformation φ(x), and the optimal regression function is obtained in H, so a linear regression in a high dimensional space corresponds to a nonlinear regression in a low dimensional space.
Basically, the LS-SVM uses the same principles as the SVM, but it simplifies the operation by an appropriate transformation, and improves the accuracy and convergence speed of the solution problem [22,23]. The LS-SVM can be formulated as a minimization problem as follows: where y i is the target value and x i is the input vector, ω T φ(x i ) represents the inner product in the high-dimensional feature space H, ω and b are the normal vector and the offset vector, respectively, ε i is the slack variable and γ is the adjustable parameters that controls the regression error. The regression of LS-SVM is: where K is the kernel function, α i are the Lagrange multipliers. In this work, the radial basis function (RBF) kernel is used [24], and parameters are learned by the particle swarm optimization (PSO) algorithm [25]. The decomposed sequences (wH i 1 ,...,wH i 7 ,wL i 7 ) and (d 1 ,…,d j ) are the input of the LS-SVM model. First of all, the low-frequency sequences at the 7 th level of m periods wL 1 7 ,wL 2 7 ,…,wL m 7 are used to forecast the 7 th level wL 7 Q of the target (m+1) period. The process can be described as follows: 1) Select wL 1 7 ,wL 2 7 ,…,wL m 7 as a training data sample to train the LS-SVM model. The passenger flow of the Tiyu Xilu Station presents three states: 1) The concentrated passenger flow pattern on working days; 2) The loose passenger flow pattern on weekends; 3) The increase in the number of state 2 during festivals. Figure 5 illustrates two different inbound passenger flow patterns at the Tiyu Xilu Station: weekdays+festivals and weekdays+weekends. Passenger flow at the station will change greatly on the weekends, showing a cycle of one week. That is, from Monday to Friday the traffic is often concentrated, and the passenger flow on the weekends is relatively light. In this case, forecasting the passenger traffic in the unit of weeks will achieve better prediction results. However, when there are holidays, large-scale events or sudden emergencies, the passenger flow will no longer show a cycle of one week. At that point, using forecasting in the unit of days can achieve better results. In other words, the former meets the requirements of a sufficient sample size, simultaneously considering the periodicity of passenger flow, but ignores the recent changes, while the latter takes recent changes into account in the predicted values, but lacks sufficient samples. The SN-LSSVM model can achieve a sufficient sample size, consideration of the periodicity, and recent changes in the predicted value.

Evaluation methods
In this paper, the metrics for performance evaluation include: (a) root mean square error (RMSE), for measuring the deviation between the predicted value and the actual value; (b) correlation coefficient R, for measuring the explanatory power of models by calculating the correlation coefficient between observations and predictions; and (c) mean arctangent absolute percentage error (MAAPE), for measuring the relative average deviation, defined as following:

ANALYSIS OF THE SUBWAY PASSENGER FLOW
The AFC system was able to achieve real-time data collection of metro passengers entering and exiting the station [26]. By simple statistics, the ridership data can be achieved for the required time interval, such as 5 min, 15 min, 30 min, and 60 min.  Figure 6 shows the high-and low-frequency signals decomposed from 13 February to 19 February 2017. As seen from the figure, the complex passenger flow time series is decomposed into simple high-and low-frequency curves, which not only reduces the calculation time, but also improves the stability of prediction results.
where n is the number of samples, , y yi i are the predicted value and observation value respectively, and ( , ) Cov y y is the covariance of y and y, Var y 6 @ is the variance of , y 6 @ and ar y V 6 @ is the variance of y.

Forecasting models and weights
In this work, we employ five typical forecasting methods: ARIMA [27], LSTM [11], D-LSSVM, W-LSSVM and SN-LSSVM. The experimental descriptions of the dataset are in Table 2 and the passenger flow on weekdays and holidays are predicted, respectively.    be seen from the figure that the predicted passenger flow time series curve is essentially consistent with the actual passenger flow time series curve. However, as can be seen from Figure 7b and Figure 7c, when there is a holiday (May Day) in the forecast target, the predicted curve is not consistent with the actual curve. The underlying reason could be that the t+7 forecasting is unstable. A sudden change in the passenger flow, meanwhile, comes when there is a holiday. Table 4 shows the weights (α 1 ,α 2 ) of the two methods, which range from 0 to 1; the lager the value, the higher the accuracy. Since the difference in the ridership pattern is between the previously described weekdays and holidays, the weights have two patterns. As can be seen from the table, in the SN-LSSVM model, the average weights of The prediction result of the SN-LSSVM is obtained by combining the results of the W-LSSVM and D-LSSVM. As discussed in Subsection 2.6, the weights of the W-LSSVM and D-LSSVM are obtained with the original passenger flow data at the Tiyu Xilu Station in 2016.

Evaluation of forecasting horizon
Forecasting horizon denotes the length of predicted future. In this section we evaluate the prediction performance in 1 day and 7 days in the future, that is h=1,7, meaning t+h forecasting. In this paper, the D-LSSVM is a t+1 forecasting model and the W-LSSVM is a t+7 forecasting model.   the actual situation. Generally, we conclude that the nonparametric regression model SVM outperforms some neural networks and ARIMA, which is consistent with previous studies as well [5,6,26].

Validation results of the novel SN-LSSVM prediction model
In order to facilitate the comparison of the five models, we have divided the indicators of the prediction results into hierarchies, and the principle of division is as follows: where RX is a qualified interval. Then, the RXs of the three indicators are normalized to [0,1]. Specifically, the RX of R is 1-[0,1]. If the indicators exceed the qualified interval, its value will be presented in black (Figure 9). Figure 9 shows the forecasting errors of different models for working days and holidays, respectively.
Firstly, it can be easily concluded that the predicted result of the ARIMA is unsatisfactory. Most of the indicators of the ARIMA are unqualified, indicating the linear modelling is unsuitable for forecasting nonlinear time series.
the W-LSSVM model for non-holiday predictions are much larger than that for holidays. This is because of the sudden change in the passenger flow, the nonlinearity of which can be addressed by the D-LSSVM model. In contrast, the working days ridership is mainly composed of the commuted passenger flow, which is more stable. Therefore, the weights of W-LSSVM model are similar to those during working days.

Forecasting results
It can be seen from Figure 8b that the prediction trends of the five prediction methods are essentially the same for the prediction of short-term subway passenger flow. However, Figure 8a shows that the prediction trends of the D-LSSVM, W-LSSVM, and SN-LSSVM are essentially the same, while the LSTM and ARIMA fail to fully grasp the real-time variation law of the passenger flow, and the prediction effect has an obvious deviation from the actual situation. Similarly, as can be seen from Figure 8c, for the short-term subway passenger flow forecasting during holidays, the prediction trends of the D-LSSVM, SN-LSSVM, and LSTM are essentially the same, while the others fail to fully grasp the real-time variation law of the passenger flow, and the prediction effect has an obvious deviation from Since the D-LSSVM has a smaller forecasting horizon compared to the W-LSSVM, we can conclude that short-term passenger flow forecasting could provide a fair prediction in the immediate future, which is typically up to one day ahead. Moreover, the W-LSSVM is suitable for the prediction during normal working days rather than holidays, due to the instability of passenger flow on holidays. Furthermore, the W-LSSVM provides a satisfactory predicted value for three weeks after May Day, indicating that the model is more robust.
Secondly, RMSE and R of the LSTM generally meet the prediction accuracy requirements, while the MAAPE is not satisfactory. We can conclude that the LSTM also has large errors in predicting small passenger flow samples (in the morning and evening). Moreover, holidays have a larger impact on the prediction accuracy of the LSTM, and the value of the indicator is still affected after three weeks.
Thirdly, the D-LSSVM outperforms the W-LSSVM as a whole (up to 4.1% improvement in MAAPE and up to 5.3% improvement in RMSE). 20 Figure 9 -Performances of the 5 different models traffic data profiling, passenger flow analysis, and predictive modelling. The proposed model has great potential for solving many other relevant prediction problems, which can extract the periodicity and nonlinearity characteristics of the time series. Based on the research results, the following conclusions can be drawn: (1) Nonparametric nonlinear regression model LS-SVM is more suitable for capturing the volatility characteristics of the time series data, and is more flexible for nonlinear and high-dimensional AFC data than LSTM. (2) External characteristics of the time series such as periodicity or nonlinearity contribute to short-term passenger flow prediction; (3) A smaller forecasting horizon has a more stable predictive result than a larger one. (4) The SN-LSSVM model can be maintained as a satisfying requirement for forecasting year-around metro passenger flow. We first applied the method to the Tiyu Xilu station, which is the most crowded station of Guangzhou metro, and it contributed to the management of the station.
In future studies, several improvements should be considered. (a) external characteristics of the time series such as examples from previous years can be considered, (b) other external influential factors of ridership, such as spatio-temporal information in the metro network, can be added to the method if sufficient data are available [4], (c) another important issue is forecasting passenger flow under special circumstances, such as abnormal weather, large-scale events, and emergencies.
Fourthly, there are some cases where the predicted value of the D-LSSVM fails, because the model is suitable for a small sample training and the training set of the model is insufficient.
Fifthly, all of the prediction results of SN-LSSVM fall into the qualified interval, which is superior to other models. Moreover, the SN-LSSVM has high indicators of the prediction results, especially for holiday forecasting. Since the SN-LSSVM leverages multivariate features (periodicity) for predictive learning, while the D-LSSVM considers nonlinearity of the time series only, we can conclude that additional time series features can be more effective for prediction.
In summary, under the different metro passenger flow modes of holidays and working days, the SN-LSSVM model can be maintained as a satisfying requirement, since its prediction accuracy is high, and it has greater stability and applicability compared to the state-of-the-art time series forecasting methods.

Efficiency analysis
Our approach consists of an offline training phase and an online forecasting phase. In the training phase, the major time cost depends on the training time cost that the predictive model uses. The computational complexity using the LSSVM with RBF kernel is O(n 2 · d) [28], where n is the training size and d denotes the number of features. To be specific, the average time cost of the SN-LSSVM training is approximately 5.5 minutes in our dataset (20 days × 18+15 weeks × 126 slots training samples), which means, the forecasting process can achieve real-time prediction.

CONCLUSION
In recent years, short-term traffic flow prediction has been developing, and it will undoubtedly keep improving. However, it is unfortunate that there is only one small study which specifically deals with short-term subway passenger flow prediction. Moreover, these models did not fully consider the key factors of short-term metro passenger flow forecasting. Thus, this article mainly aims to improve the prediction accuracy of metro passenger flow time series prediction models, whether the forecasting target is a working day or a holiday. This paper establishes a novel SN-LSSVM hybrid model for short-term subway passenger flow prediction, which is composed of three stages: