Prediction of Fatalities in Vehicle Collisions in Canada
Traffic collisions affect millions around the world and are the leading cause of death for children and young adults. Thus, Canada’s road safety plan is to reduce collision injuries and fatalities with a vision of making the safest roads in the world. We aim to predict fatalities of collisions on Canadian roads, and to discover causation of fatalities through exploratory data analysis and machine learning techniques. We analyse the vehicle collisions from Canada’s National Collision Database (1999–2017.) Through data mining methodologies, we investigate association rules and key contributing factors that lead to fatalities. Then, we propose two supervised learning classification models, Lasso Regression and XGBoost, to predict fatalities. Our analysis shows the deadliness of head-on collisions, especially in non-intersection areas with lacking traffic control systems. We also reveal that most collision fatalities occur in non-extreme weather and road conditions. Our prediction models show that the best classifier of fatalities is XGBoost with 83% accuracy. Its most important features are “collision configuration” and “used safety devices” elements, outnumbering attributes such as vehicle year, collision time, age, or sex of the individual. Our exploratory and predictive analysis reveal the importance of road design and traffic safety education.
Global status report on road safety 2018. World Health Organisation; 2018. Available from: https://www.who.int/violence_injury_prevention/road_safety_status/2018/en/ [Accessed 7th Jan. 2021].
Canadian Motor Vehicle Traffic Collision Statistics: 2017. Transport Canada; 2017. Available from: https://tc.canada.ca/en/road-transportation/motor-vehicle-safety/canadian-motor-vehicle-traffic-collision-statistics-2017 [Accessed 7th Jan. 2021].
Vehicle registrations, by type of vehicle. Statistics Canada; 2020. Available from: https://www150.statcan.gc.ca/t1/tbl1/en/tv.action?pid=2310006701 [Accessed 7th Jan. 2021].
Canada’s road safety strategy 2025. Canadian Council of Motor Transport Administrators; 2016. Available from: http://roadsafetystrategy.ca/files/RSS-2025-Report-January-2016-with%20cover.pdf [Accessed 7th Jan. 2021].
2018 Annual vision zero report. City of Edmonton; 2018. Available from: https://www.edmonton.ca/transportation/PDF/2018_VisionZero-EdmontonAnnualReport.pdf [Accessed 7th Jan. 2021].
Transportation Services City of Toronto. Vision zero Toronto's road safety plan; 2017. Available at: https://www.toronto.ca/wp-content/uploads/2017/11/990f-2017-Vision-Zero-Road-Safety-Plan_June1.pdf [Accessed 7th Jan. 2021].
Venkataraman N, Ulfarsson GF, Shankar VN. Random parameter models of interstate crash frequencies by severity, number of vehicles involved, collision and location type. Accident Analysis & Prevention. 2013;59: 309-318. DOI: 10.1016/j.aap.2013.06.021
Ahmadi A, Jahangiri A, Berardi V, Machiani SG. Crash severity analysis of rear-end crashes in California using statistical and machine learning classification methods. Journal of Transportation Safety & Security. 2020;12(4): 522-546. DOI: 10.1080/19439962.2018.1505793
Oikawa S, Matsui Y. Features of serious pedestrian injuries in vehicle-to-pedestrian accidents in Japan. International Journal of Crashworthiness. 2017;22(2): 202-213. DOI: 10.1080/13588265.2016.1244230
Evgenikos P, et al. Characteristics and causes of heavy goods vehicles and buses accidents in Europe. Transportation Research Procedia. 2016;14: 2158-2167. DOI: 10.1016/j.trpro.2016.05.231
Gutierrez-Osorio C, Pedraza CA, Characterizing road accidents in urban areas of Bogota (Colombia): A data science approach. In: 2019 2nd Latin American Conference on Intelligent Transportation Systems, 19 March 2019, Bogota, Colombia. IEEE; 2019. p. 1-6. DOI: 10.1109/ITSLATAM.2019.8721334
Pai CW, Lin HY, Tsai SH, Chen PL. Comparison of traffic-injury related hospitalisation between bicyclists and motorcyclists in Taiwan. PLoS One. 2018;13(1): e0191221. DOI: 10.1371/journal.pone.0191221
Novkovic M, et al. Data science applied to extract insights from data-weather data influence on traffic accidents. INFOTEH-JAHORINA. 2017;16: 387-392.
Gaurav, Alam Z. Improving Road Safety in India Using Data Mining Techniques. In: Panda B, Sharma S, Roy N. (eds) Data Science and Analytics. REDSET 2017. Communications in Computer and Information Science, vol 799. Springer, Singapore; 2018. p. 187-194. DOI: 10.1007/978-981-10-8527-7_17
Shanshal D, Babaoglu C, Başar A. Prediction of Fatal and Major Injury of Drivers, Cyclists, and Pedestrians in Collisions. Promet – Traffic&Transportation. 2020;32(1): 39-53. DOI: 10.7307/ptt.v32i1.3134
Vingilis E, Wilk P. Predictors of motor vehicle collision injuries among a nationally representative sample of Canadians. Traffic Injury Prevention. 2007;8(4): 411-418. DOI: 10.1080/15389580701626202
Nhan C, Rothman L, Slater M, Howard A. Back-over collisions in child pedestrians from the Canadian Hospitals Injury Reporting and Prevention Program. Traffic Injury Prevention. 2009;10(4): 350-353. DOI: 10.1080/15389580902995166
Lécuyer JF, Chouinard A. Study on the effect of vehicle age and the importation of vehicles 15 years and older on the number of fatalities, serious injuries and collisions in Canada. In: Proceedings of the Canadian Multidisciplinary Road Safety Conference XVI; 11 June 2006.
Watkins E, Kloc M, Weerasuriya S, El-Hajj M. Collision analysis of driving scenarios. 2017 IEEE 7th Annual Computing and Communication Workshop and Conference (CCWC), 9-11 Jan. 2017, Las Vegas, NV, USA; 2017. p. 1-7. DOI: 10.1109/CCWC.2017.7868413
Demers S. Survivability factors for Canadian cyclists hit by motor vehicles. Journal of Community Safety & Well-being. 2018;3(2): 27-3. DOI: 10.35502/jcswb.66
Government of Canada. National Collision Database [database]; 2019. Available from: https://open.canada.ca/data/en/dataset/1eb9eba7-71d1-4b30-9fb1-30cbdab7e63a [Accessed 7th Jan. 2021].
Azur MJ, Stuart EA, Frangakis C, Leaf PJ. Multiple imputation by chained equations: What is it and how does it work?. International Journal of Methods in Psychiatric Research. 2011;20(1): 40-9. DOI: 10.1002/mpr.329
Lunardon N, Menardi G, Torelli N. ROSE: A Package for Binary Imbalanced Learning. R Journal. 2014;6(1): 79-89. Available from: https://www.mclibre.org/descargar/docs/revistas/the-r-journal/the-r-journal-11-en-201406.pdf#page=79 [Accessed 8th Jan. 2020].
Tibshirani R. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological). 1996;58(1): 267-88. DOI: 10.1111/j.2517-6161.1996.tb02080.x
Chen T, Guestrin C. Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 13-17 Aug 2016, San Francisco, California, USA; 2016. p. 785-794. DOI: 10.1145/2939672.2939785
Copyright (c) 2021 Liza Babaoglu, Ceni Babaoglu
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).