Prediction of Fatalities in Vehicle Collisions in Canada

  • Liza Babaoglu University of Toronto
  • Ceni Babaoglu Ryerson University
Keywords: fatality, collision, prediction, classification, data mining, road safety


Traffic collisions affect millions around the world and are the leading cause of death for children and young adults. Thus, Canada’s road safety plan is to reduce collision injuries and fatalities with a vision of making the safest roads in the world. We aim to predict fatalities of collisions on Canadian roads, and to discover causation of fatalities through exploratory data analysis and machine learning techniques. We analyse the vehicle collisions from Canada’s National Collision Database (1999–2017.) Through data mining methodologies, we investigate association rules and key contributing factors that lead to fatalities. Then, we propose two supervised learning classification models, Lasso Regression and XGBoost, to predict fatalities. Our analysis shows the deadliness of head-on collisions, especially in non-intersection areas with lacking traffic control systems. We also reveal that most collision fatalities occur in non-extreme weather and road conditions. Our prediction models show that the best classifier of fatalities is XGBoost with 83% accuracy. Its most important features are “collision configuration” and “used safety devices” elements, outnumbering attributes such as vehicle year, collision time, age, or sex of the individual. Our exploratory and predictive analysis reveal the importance of road design and traffic safety education.


Global status report on road safety 2018. World Health Organisation; 2018. Available from: [Accessed 7th Jan. 2021].

Canadian Motor Vehicle Traffic Collision Statistics: 2017. Transport Canada; 2017. Available from: [Accessed 7th Jan. 2021].

Vehicle registrations, by type of vehicle. Statistics Canada; 2020. Available from: [Accessed 7th Jan. 2021].

Canada’s road safety strategy 2025. Canadian Council of Motor Transport Administrators; 2016. Available from: [Accessed 7th Jan. 2021].

2018 Annual vision zero report. City of Edmonton; 2018. Available from: [Accessed 7th Jan. 2021].

Transportation Services City of Toronto. Vision zero Toronto's road safety plan; 2017. Available at: [Accessed 7th Jan. 2021].

Venkataraman N, Ulfarsson GF, Shankar VN. Random parameter models of interstate crash frequencies by severity, number of vehicles involved, collision and location type. Accident Analysis & Prevention. 2013;59: 309-318. DOI: 10.1016/j.aap.2013.06.021

Ahmadi A, Jahangiri A, Berardi V, Machiani SG. Crash severity analysis of rear-end crashes in California using statistical and machine learning classification methods. Journal of Transportation Safety & Security. 2020;12(4): 522-546. DOI: 10.1080/19439962.2018.1505793

Oikawa S, Matsui Y. Features of serious pedestrian injuries in vehicle-to-pedestrian accidents in Japan. International Journal of Crashworthiness. 2017;22(2): 202-213. DOI: 10.1080/13588265.2016.1244230

Evgenikos P, et al. Characteristics and causes of heavy goods vehicles and buses accidents in Europe. Transportation Research Procedia. 2016;14: 2158-2167. DOI: 10.1016/j.trpro.2016.05.231

Gutierrez-Osorio C, Pedraza CA, Characterizing road accidents in urban areas of Bogota (Colombia): A data science approach. In: 2019 2nd Latin American Conference on Intelligent Transportation Systems, 19 March 2019, Bogota, Colombia. IEEE; 2019. p. 1-6. DOI: 10.1109/ITSLATAM.2019.8721334

Pai CW, Lin HY, Tsai SH, Chen PL. Comparison of traffic-injury related hospitalisation between bicyclists and motorcyclists in Taiwan. PLoS One. 2018;13(1): e0191221. DOI: 10.1371/journal.pone.0191221

Novkovic M, et al. Data science applied to extract insights from data-weather data influence on traffic accidents. INFOTEH-JAHORINA. 2017;16: 387-392.

Gaurav, Alam Z. Improving Road Safety in India Using Data Mining Techniques. In: Panda B, Sharma S, Roy N. (eds) Data Science and Analytics. REDSET 2017. Communications in Computer and Information Science, vol 799. Springer, Singapore; 2018. p. 187-194. DOI: 10.1007/978-981-10-8527-7_17

Shanshal D, Babaoglu C, Başar A. Prediction of Fatal and Major Injury of Drivers, Cyclists, and Pedestrians in Collisions. Promet – Traffic&Transportation. 2020;32(1): 39-53. DOI: 10.7307/ptt.v32i1.3134

Vingilis E, Wilk P. Predictors of motor vehicle collision injuries among a nationally representative sample of Canadians. Traffic Injury Prevention. 2007;8(4): 411-418. DOI: 10.1080/15389580701626202

Nhan C, Rothman L, Slater M, Howard A. Back-over collisions in child pedestrians from the Canadian Hospitals Injury Reporting and Prevention Program. Traffic Injury Prevention. 2009;10(4): 350-353. DOI: 10.1080/15389580902995166

Lécuyer JF, Chouinard A. Study on the effect of vehicle age and the importation of vehicles 15 years and older on the number of fatalities, serious injuries and collisions in Canada. In: Proceedings of the Canadian Multidisciplinary Road Safety Conference XVI; 11 June 2006.

Watkins E, Kloc M, Weerasuriya S, El-Hajj M. Collision analysis of driving scenarios. 2017 IEEE 7th Annual Computing and Communication Workshop and Conference (CCWC), 9-11 Jan. 2017, Las Vegas, NV, USA; 2017. p. 1-7. DOI: 10.1109/CCWC.2017.7868413

Demers S. Survivability factors for Canadian cyclists hit by motor vehicles. Journal of Community Safety & Well-being. 2018;3(2): 27-3. DOI: 10.35502/jcswb.66

Government of Canada. National Collision Database [database]; 2019. Available from: [Accessed 7th Jan. 2021].

Azur MJ, Stuart EA, Frangakis C, Leaf PJ. Multiple imputation by chained equations: What is it and how does it work?. International Journal of Methods in Psychiatric Research. 2011;20(1): 40-9. DOI: 10.1002/mpr.329

Lunardon N, Menardi G, Torelli N. ROSE: A Package for Binary Imbalanced Learning. R Journal. 2014;6(1): 79-89. Available from: [Accessed 8th Jan. 2020].

Tibshirani R. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological). 1996;58(1): 267-88. DOI: 10.1111/j.2517-6161.1996.tb02080.x

Chen T, Guestrin C. Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 13-17 Aug 2016, San Francisco, California, USA; 2016. p. 785-794. DOI: 10.1145/2939672.2939785

How to Cite
Babaoglu L, Babaoglu C. Prediction of Fatalities in Vehicle Collisions in Canada. Promet [Internet]. 2021Oct.8 [cited 2022Jul.6];33(5):661-9. Available from: