Analysed potential of big data and supervised machine learning techniques in effectively forecasting travel times from fused data

Ivana Šemanjski

doi:10.7307/ptt.v27i6.1762

Ivana Šemanjski Faculty of transport and traffic sciences, University of Zagreb, Vukeliceva 4, 1000 Zagreb, Croatia

DOI: https://doi.org/10.7307/ptt.v27i6.1762

Keywords: big data, support vector machines, k-nearest neighbours, boosting trees, random forest, forecasting travel times, data fusion

Abstract

Travel time forecasting is an interesting topic for many ITS services. Increased availability of data collection sensors increases the availability of the predictor variables but also highlights the high processing issues related to this big data availability. In this paper we aimed to analyse the potential of big data and supervised machine learning techniques in effectively forecasting travel times. For this purpose we used fused data from three data sources (Global Positioning System vehicles tracks, road network infrastructure data and meteorological data) and four machine learning techniques (k-nearest neighbours, support vector machines, boosting trees and random forest).

To evaluate the forecasting results we compared them in-between different road classes in the context of absolute values, measured in minutes, and the mean squared percentage error. For the road classes with the high average speed and long road segments, machine learning techniques forecasted travel times with small relative error, while for the road classes with the small average speeds and segment lengths this was a more demanding task. All three data sources were proven itself to have a high impact on the travel time forecast accuracy and the best results (taking into account all road classes) were achieved for the k-nearest neighbours and random forest techniques.

Author Biography

Ivana Šemanjski, Faculty of transport and traffic sciences, University of Zagreb, Vukeliceva 4, 1000 Zagreb, Croatia

Department of Inteligent transortatioin systems

asst. prof.

References

X. Xu, A. Chen and L. Cheng, “Assessing the effects of stochastic perception error under travel time variability,” Transportation, vol. 40, no. 3, pp. 525-548, 2013.

M. Kim, E. Miller-Hooks and R. Nair, “A Geographic Information System-Based Real-Time Decision Support Framework for Routing Vehicles Carrying Hazardous Materials,” Journal of Intelligent Transportation Systems: Technology, Planning, and Operations, vol. 15, no. 1, pp. 28-41, 2011.

Z. Yu, M. Ni, Z. Wang and Y. Zhang, “Dynamic Route Guidance Using Improved Genetic Algorithms,” Mathematical Problems in Engineering, vol. 2013, no. Accessed February 8, 2015, p. http://dx.doi.org/10.1155/2013/765135, 2013.

Y. Yin, W. H. K. Lam and H. Ieda, “Modeling risk-taking behavior in queuing networks with advanced traveler information systems,” Transportation and traffic theory, vol. 15, pp. 309-328, 2002.

B. Yu, Z.-Z. Yang, K. Chen and B. Yu, “Hybrid model for prediction of bus arrival times at next station,” Journal of Advanced Transportation, vol. 44, no. 3, p. 193–204, 2010.

L. Sun, J. Yang and H. Mahmassanic, “Travel time estimation based on piecewise truncated quadratic speed trajectory,” Transportation Research Part A: Policy and Practice, vol. 42, no. 1, p. 173–186, 2008.

W. Zheng, D.-H. Lee and Q. Shi, “Short-Term Freeway Traffic Flow Prediction: Bayesian Combined Neural Network Approach,” Journal of Transportation Engineering, vol. 132, no. 2, p. 114–121, 2006.

M. Mahmood, M. A. Bashar and S. Akhter, “Traffic Management System and Travel Demand Management (TDM) Strategies: Suggestions for Urban Cities in Bangladesh,” Asian Journal of Management and Humanity Sciences, vol. 4, no. 2-3, pp. 161-178, 2009.

G. Lyons and J. Urry, “Travel time use in the information age,” Transportation Research Part A: Policy and Practice, vol. 39, no. 2-3, p. 257–276, 2005.

N. Brnjac and I. Ćavar, “Example of Positioning Intermodal Terminals on Inland Waterways,” PROMET - Traffic&Transportation, vol. 21, no. 6, pp. 433-439, 2009.

M. Malchow, A. Kanafani and P. Varaiya, “The Economics of Traffic Information: A State-of-the-Art Report,” Institute of Transportation Studies, University of California at Berkeley, Berkeley, 1996.

A. Bhaskar, E. Chung and A.-G. Dumont, “Analysis for the Use of Cumulative Plots for Travel Time Estimation on Signalized Network,” International Journal of Intelligent Transportation Systems Research, vol. 8, no. 3, pp. 151-163, 2010.

F. Zong, H. Lin, B. Yu and X. Pan, “Daily Commute Time Prediction Based on Genetic Algorithm,” Mathematical Problems in Engineering, vol. 2012, no. Accessed on February 9, 2015, p. http://dx.doi.org/10.1155/2012/321574, 2012.

A. Simroth and H. Zähle, “Travel Time Prediction Using Floating Car Data Applied to Logistics Planning,” Intelligent Transportation Systems, IEEE Transactions on, vol. 12, no. 1, pp. 243 - 253, 2011.

Y. Huang, L. Xu and X. Kuang, “Urban Road Travel Time Prediction Based on Taxi GPS Data,” in Improving Multimodal Transportation Systems-Information, Safety, and Integration, Wuhan, China, 2013.

S. P. Anusha, R. A. Anand and L. Vanajakshi, “Data Fusion Based Hybrid Approach for the Estimation of Urban Arterial Travel Time,” Journal of Applied Mathematics, vol. 2012, no. Accessed on January 27, 2015, p. doi:10.1155/2012/587913, 2012.

K. Lum, H. Fan and S. Olszewski, “Speed-Flow Modeling of Arterial Roads in Singapore,” Journal of Transportation Engineering, vol. 124, no. 6, p. doi: 10.1061, 1998.

I. Ćavar, Z. Kavran and M. Petrović, “Hybrid Approach for Urban Roads Classification Based on GPS Tracks and Road Subsegments Data,” PROMET - Traffic&Transportation, vol. 23, no. 4, pp. 289-296, 2011.

M. Akbari, P. J. van Overloop and A. Af, “Clustered K Nearest Neighbor Algorithm for Daily Inflow Forecasting,” Water Resources Management, vol. 25, no. 5, pp. 1341-1357, 2011.

L. Li, Y. Zhang and Y. Zhao, “k-Nearest Neighbors for automated classification of celestial objects,” Science in China Series G: Physics, Mechanics and Astronomy, vol. 51, no. 7, pp. 916-922, 2008.

G. Valenti, V. Lelli and D. Cucina, “A comparative study of models for the incident duration prediction,” European Transport Research Review, vol. 2, no. 2, pp. 103-111, 2010.

J. Poloczek, N. A. Treiber and O. Krame, “KNN Regression as Geo-Imputation Method for Spatio-Temporal Wind Data,” Advances in Intelligent Systems and Computing, vol. 299, pp. 185-193, 2014.

H. Wang, . I. Düntsch, G. Gediga and G. Guo, “Nearest Neighbours without k,” Monitoring, Security, and Rescue Techniques in Multiagent Systems, vol. 28, pp. 179-189, 2005.

R. Battiti, F. Mascia and M. Brunato, “Supervised Learning,” Reactive Search and Intelligent Optimization, vol. 45, pp. 1-33, 2009.

G. Batista and D. F. Silva, “How k-Nearest Neighbor Parameters Affect its Performance,” in Simposio Argentino de Inteligencia Artificial , Mar del Plata, Argentina, 2009.

B. S. Everitt, S. Landau, M. Leese and D. Stahl, “Miscellaneous Clustering Methods,” in Cluster Analysis, Chichester, UK, John Wiley & Sons, 2011, p. doi: 10.1002/9780470977811.ch8.

F. Nigsch, A. Bender, B. van Buuren, J. Tissen, E. Nigsch and J. B. O. Mitchell , “Melting Point Prediction Employing k-Nearest Neighbor Algorithms and Genetic Parameter Optimization,” J. Chem. Inf. Model., vol. 46, no. 6, p. 2412–2422, 2006.

P. Hall, B. U. Park and R. J. Samwor, “Choice of neighbor order in nearest-neighbor classification,” The Annals of Statistics, vol. 36, no. 5, p. 2135–2152, 2008.

E. López-Rubio and J. M. Ortiz-de-Lazcano, “Automatic Model Selection by Cross-Validation for Probabilistic PCA,” Neural Processing Letters, vol. 30, no. 2, pp. 113-132, 2009.

J. P. Donate, P. Cortez and G. Gutierre, “Weighted Cross-Validation Evolving Artificial Neural Networks to Forecast Time Series,” in Soft Computing Models in Industrial and Environmental Applications, Salamanca, Spain, 2011.

S. Abe, Support Vector Machines for Pattern Classification (Advances in Computer Vision and Pattern Recognition), London, UK: Springer, 2010.

R. Burbidge and B. Buxton, “An Introduction to Support Vector Machines for Data Mining,” in Keynote papers , Nottingham, UK, University of Nottingham, Operational research society, 2001, pp. 3-16.

I. Steinwart and A. Christmann, Support Vector Machines (Information Science and Statistics), New York, USA: Springer, 2008.

L. H. Hamel, Knowledge Discovery with Support Vector Machines, Hoboken, Canada: Wiley-Interscience, 2009.

R. Appel, T. Fuchs , P. Dollar and P. Perona , “Quickly Boosting Decision Trees - Pruning Underachieving Features Early -,” in 30th International Conference on Machine Learning, Atlanta, USA, 2013.

Y. Freund and R. E. Schapire, “A Short Introduction to Boosting,” Journal of Japanese Society for Artificial Intelligence, vol. 14, no. 5, pp. 771-780, 1999.

J. Friedman, T. Hastie and R. Tibshira, “Additive Logistic Regression: A Statistical View of Boosting,” The Annals of Statistics, vol. 28, no. 2, pp. 337-407, 2000.

Y.-C. I. Chang, Y. Huang and Y.-P. Huang, “Early stopping in L2Boosting,” Computational Statistics & Data Analysis, vol. 54, no. 10, p. 2203–2213, 2010.

L. Breiman, “Random Forests,” Machine Learning, vol. 45, no. 1, pp. 5-32, 2001.

G. Biau, “Analysis of a random forests model,” The Journal of Machine Learning Research, vol. 13, pp. 1063-1095, 2012.

U. Grömping, “Variable Importance Assessment in Regression: Linear Regression versus Random Forest,” The American Statistician, vol. 63, no. 4, pp. 308-319, 2009.

G. Biau , L. Devroye and G. Lugosi , “Consistency of Random Forests and Other Averaging Classifiers,” Journal of Machine Learning Research, vol. 9, pp. 2015-2033, 2008.

M. Pal, “Random forest classifier for remote sensing classification,” International Journal of Remote Sensing, vol. 26, no. 1, pp. 217-222, 2007.