The Effect of Drivers' Demographic Characteristics on Road Accidents in Different Seasons Using Data Mining
Abstract
According to World Health Organization, each year, over 1.2 million people die on roads, and between 20 and 50 million suffer non-fatal injuries. Based on international reports, Iran has a high death rate caused by road accidents. The objective of this study was to extract implicit knowledge from road accident data sets on roads of Iran through data mining. In this regard, three useful data mining techniques were combined: clustering, classification and rule extraction. Following the preparation stage, data were segmented via three clustering algorithms; Kohonen, K-Means and Twostep. Two-step cluster analysis is a one-pass-through data approach which generates a fairly large number of pre-clusters. Next, the optimized algorithm and cluster were identified, after which, in the classification level and by adding the drivers' demographic features through C5.0, a classification algorithm was employed so as to make the decision tree. Ultimately, the effects of these demographic features were investigated on road accidents. The characteristics such as age, job, driving license duration and gender proved to be more important factors in accident analysis. Certain rules of accidents were then extracted in each season of the year.
References
Olutayo V, Eludire A. Traffic accident analysis using decision trees and neural networks. International Journal of Information Technology and Computer Science (IJITCS). 2014;6(2): 22-8.
Ossenbruggen PJ, Pendharkar J, Ivan J. Roadway safety in rural and small urbanized areas. Accident Analysis & Prevention. 2001;33(4): 485-98.
WHO. Global status report on road safety: time for action 2016. Available from: http://www.who.int/gho/publications/world_health_statistics/2016/whs2016_AnnexA_RoadTraffic.pdf?ua=1.
Han J, Kamber M, Pei J. Data mining: concepts and techniques. Elsevier; 2011.
Chang L-Y, Wang H-W. Analysis of traffic injury severity: An application of non-parametric classification tree techniques. Accident Analysis & Prevention. 2006;38(5): 1019-27.
Fortin M, Bédard S, DeBlois J, Meunier S. Predicting individual tree mortality in northern hardwood stands under uneven-aged management in southern Québec, Canada. Annals of Forest Science. 2008;65(2): 12 p.
Regassa Z. Determining the degree of driver’s responsibility for car accident: the case of Addis Ababa traffic office. Unpublished Master’s Thesis. Addis Ababa University; 2009.
Chen SH. Mining patterns and factors contributing to crash severity on road curves. Queensland University of Technology; 2010.
Pakgohar A, Tabrizi RS, Khalili M, Esmaeili A. The role of human factor in incidence and severity of road crashes based on the CART and LR regression: a data mining approach. Procedia Computer Science. 2011;3: 764-9.
Beshah T, Hill S, editors. Mining Road Traffic Accident Data to Improve Safety: Role of Road-Related Factors on Accident Severity in Ethiopia. 2010 AAAI Spring Symposium: Artificial Intelligence for Development, 22-24-Mar. 2010, Stanford, CA, USA; 2010.
Xu C, Liu P, Wang W, Li Z. Evaluation of the impacts of traffic states on crash risks on freeways. Accident Analysis & Prevention. 2012;47: 162-71.
Ng K-s, Hung W-t, Wong W-g. An algorithm for assessing the risk of traffic accident. Journal of Safety Research. 2002;33(3): 387-410.
Mohamed MG, Saunier N, Miranda-Moreno LF, Ukkusuri SV. A clustering regression approach: A comprehensive injury severity analysis of pedestrian–vehicle crashes in New York, US and Montreal, Canada. Safety Science. 2013;54: 27-37.
Khosravi Shadmani F, Soori H, Karmi M, Zayeri F, Mehmandar M. Estimating of Population Attributable Fraction of Unauthorized Speeding and Overtaking on Rural Roads of Iran. Iranian Journal of Epidemiology.
;8(4): 9-14.
Alizadeh SS, Mortazavi SB, Sepehri MM. Prediction of vehicle traffic accidents using Bayesian networks. Scientific Journal of Pure and Applied Sciences. 2014;3(6): 356-62.
Carrasco CE, Godinho M, de Azevedo Barros MB, Rizoli S, Fraga GP. Fatal motorcycle crashes: a serious public health problem in Brazil. World Journal of Emergency Surgery. 2012;7(Suppl 1).
Yang J, Li F, Zhou J, Zhang L, Huang L, Bi J. A survey on hazardous materials accidents during road transport in China from 2000 to 2008. Journal of Hazardous Materials. 2010;184(1): 647-53.
Shanthi S, Ramani RG. Classification of vehicle collision patterns in road accidents using data mining algorithms. International Journal of Computer Applications. 2011;35(12): 30-7.
Fogue M, Garrido P, Martinez FJ, Cano J-C, Calafate CT, Manzoni P. A system for automatic notification and severity estimation of automotive accidents. IEEE Transactions on Mobile Computing. 2014;13(5): 948-63.
Chong MM, Abraham A, Paprzycki M. Traffic accident analysis using decision trees and neural networks. arXiv preprint cs/0405050. 2004.
Martín L, Baena L, Garach L, López G, de Oña J. Using data mining techniques to road safety improvement in Spanish roads. Procedia-Social and Behavioral Sciences. 2014;160: 607-14.
Williams K, Idowu AP, Olonade E. Online Road Traffic Accident Monitoring System for Nigeria. Transactions on Networks and Communications. 2015;3(1): 10-21.
Malgundkar T, Rao M, Mantha S. GIS driven urban traffic analysis based on ontology. International Journal of Managing Information Technology. 2012;4(1): 15-23.
Linoff GS, Berry MJ. Data mining techniques: for marketing, sales, and customer relationship management. John Wiley & Sons; 2011.
Wu J. Advances in K-means clustering: a data mining thinking: Springer Science & Business Media; 2012.
Kohonen T, Honkela T. Kohonen network. Scholarpedia. 2007;2(1):1568.
IBM_Corporation. TWOSTEP CLUSTER Algorithms 2013. Available from: https://www.ibm.com/support/knowledgecenter/en/SSLVMB_22.0.0/com.ibm.spss.statistics.algorithms/alg_twostep.htm.
Center IK. Predictor Importance 2012. Available from: https://www.ibm.com/support/knowledgecenter/en/SS3RA7_15.0.0/com.ibm.spss.modeler.help/model_nugget_variableimportance.htm.
Brijain RP, Kushik KR, editors. A survey on decision tree algorithm for classification. International Journal of Engineering Development and Research; 2014;2(1): 5 p.
Rulequest Research. Data Mining Tools See5 and C5.0 2015. Available from: http://www.rulequest.com/see5-info.html.
Bujlow T, Riaz T, Pedersen JM, editors. A method for classification of network traffic based on C5. 0 Machine Learning Algorithm. Proceedings of 2002 International Conference on Computing, Networking and Communications (ICNC), 30 Jan.-2 Feb. 2012, Maui, HI, USA. IEEE; 2012. p. 237-41.
IBM_Corp. Predictive Modeling with IBM SPSS Modeler 2010. Available from: https://www.scribd.com/document/141849191/MELJUN-CORTES-Predictive-Modeling-With-IBM-SPSS-Modeler.
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).