On the Performance of Machine Learning Based Flight Delay Prediction – Investigating the Impact of Short-Term Features
Abstract
People and companies today are connected around the world, which has led to a growing importance of the aviation industry. As flight delays are a big challenge in aviation, machine learning algorithms can be used to forecast those. This paper investigates the prediction of the occurrence of flight arrival delays with three promi-nent machine learning algorithms for a data set of do-mestic flights in the USA. The task is regarded as a clas-sification problem. The focus lies on the investigation of the influence of short-term features on the quality of the results. Therefore, three scenarios are created that are characterised by different input feature sets. When for-going the inclusion of short-term information in order to shift the prediction timing to an early point in time, an accuracy of 69.5% with a recall of 68.2% is achieved. By including information on the delay that the aircraft had on its previous flight, the prediction quality increases slightly. Hence, this is a compromise between the early prediction timing of the first model and the good predic-tion quality of the third model, where the departure delay of the aircraft is added as an input feature. In this case, an accuracy of 89.9% with a recall of 83.4% is obtained. The desired timing of prediction therefore determines which features to use as inputs since short-term features significantly improve the prediction quality.
References
Awad M, Khanna R. Efficient learning machines theories, concepts, and applications for engineers and system designers. Berkeley, CA: Apress; 2015.
Bureau of Transportation Statistics (BTS). 2019 traffic data for U.S. airlines and foreign airlines U.S. flights. 2020. https://www.bts.dot.gov/newsroom/final-full-year-2019-traffic-data-us-airlines-and-foreign-airlines-us-flights [Accessed 21st Mar. 2022].
Bureau of Transportation Statistics (BTS). Airline on-time performance and causes of flight delays. 2021. https://www.bts.gov/topics/airlines-and-airports/airline-time-performance-and-causes-flight-delays [Accessed 21st Mar. 2022].
Federal Aviation Administration (FAA). Air traffic by the numbers. 2020. https://www.faa.gov/air_traffic/by_the_numbers/media/Air_Traffic_by_the_Numbers_2020.pdf [Accessed 21st Mar. 2022].
Jacquillat A, Odoni AR. A roadmap toward airport demand and capacity management. Transportation Research Part A: Policy and Practice. 2018;114: 168-185. doi: 10.1016/j.tra.2017.09.027.
Belcastro L, Marozzo F, Talia D, Trunfio P. Using scalable data mining for predicting flight delays. ACM Transactions on Intelligent Systems and Technology. 2016;8(1): 1-20. doi: 10.1145/2888402.
Ding Y. Predicting flight delay based on multiple linear regression. In: Jia XL, Zhou SQ, Patty AA (eds.) IOP Conference Series: Earth and Environmental Science, Volume 81, 2nd International Conference on Materials Science, Energy Technology and Environmental Engineering (MSETEE 2017), 28–30 Apr. 2017, Zhuhai, China. IOP Publishing; 2017. 012198.
Yazdi MF, Kamel SR, Chabok SJM, Kheirabadi M. Flight delay prediction based on deep learning and Levenberg-Marquart algorithm. Journal of Big Data. 2020;7(106): 1-28. doi: 10.1186/s40537-020-00380-z.
Huo J, et al. The prediction of flight delay: Big data-driven machine learning approach. 2020 IEEE International Conference on Industrial Engineering and Engineering Management (IEEM), 14–17 Dec. 2020. IEEE; 2020. p. 190-194.
Gui G, et al. Flight delay prediction based on aviation big data and machine learning. IEEE Transactions on Vehicular Technology. 2020;69(1): 140-150. doi: 10.1109/tvt.2019.2954094.
Kalyani NL, et al. Machine learning model - based prediction of flight delay. 2020 Fourth International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), 7-9 Oct. 2020. IEEE; 2020. p. 577-581.
Manna S, et al. A statistical approach to predict flight delay using gradient boosted decision tree. 2017 International Conference on Computational Intelligence in Data Science (ICCIDS), 2-3 June 2017, Tamilnadu, India. IEEE; 2017. p. 1-5.
US Department of Transportation (US DOT). 2015 flight delays and cancellations. 2017. https://www.kaggle.com/usdot/flight-delays [Accessed 21st Mar. 2022].
Marsland S. Machine learning - An algorithmic perspective. New York: CRC Press; 2015.
Burnett RA, Si D. Prediction of injuries and fatalities in aviation accidents through machine learning. ICCDA '17: Proceedings of the International Conference on Compute and Data Analysis, 19-23 May 2017, Lakeland, USA. New York: ACM Press; 2017. p. 60-68.
Horiguchi Y, et al. Predicting fuel consumption and flight delays for low-cost airlines. AAAI'17: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 4-9 Feb. 2017, San Francisco, USA. AAAI Press; 2017. p. 4686–4693.
Jan SS, Chen YT. Development of a new airport unusual-weather detection system with aircraft surveillance information. IEEE Sensors Journal. 2019;19(20): 9543-9551. doi: 10.1109/jsen.2019.2926391.
Yablonsky G, et al. Flight delay performance at Hartsfield-Jackson Atlanta International Airport. Journal of Airline and Airport Management. 2014;4(1): 78-95. doi: 10.3926/jairm.22.
Xu N, Sherry L, Laskey KB. Multifactor model for predicting delays at U.S. Airports. Transportation Research Record: Journal of the Transportation Research Board. 2008;2052(1): 1-15. doi: 10.3141/2052-08.
National Oceanic and Atmospheric Administration (NOAA). data/ global-hourly/ archive/ csv. 2019. https://www.ncei.noaa.gov/data/global-hourly/archive/csv/ [Accessed 21st Mar. 2022].
NOAA SciJinks. How reliable are weather forecasts? https://scijinks.gov/forecast-reliability/ [Accessed 21st Mar. 2022].
Federal Aviation Administration (FAA). Core 30. https://aspm.faa.gov/aspmhelp/index/Core_30.html [Accessed 21st Mar. 2022].
Alpaydin E. Introduction to machine learning. Cambridge: MIT Press; 2020.
Russell SJ, Norvig P. Artificial intelligence - A modern approach. London: Prentice Hall; 2010.
Pedregosa F, et al. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research. 2011;12: 2825-2830. https://www.jmlr.org/papers/volume12/pedregosa11a/pedregosa11a.pdf [Accessed 21st Mar. 2022].
Chen T, Guestrin C. XGBoost: A scalable tree boosting system. KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data, Mining 13-17 Aug. 2016, San Francisco, USA. New York: ACM Press; 2016. p. 785-794.
Chollet F. Keras. https://keras.io [Accessed 21st Mar. 2022].
Abadi M, et al. TensorFlow: Large-scale machine learning on heterogeneous systems. https://www.tensorflow.org/ [Accessed 21st Mar. 2022].
Kubat M. An introduction to machine learning. Cham: Springer Nature; 2021.
Dembczynski K, et al. Optimizing the F-measure in multi-label classification: Plug-in rule approach versus structured loss minimization. PMLR Proceedings of the 30th International Conference on Machine Learning, Atlanta, USA. 2013. p. 1130-1138.
Esmaeilzadeh E, Mokhtarimousavi S. Machine learning approach for flight departure delay prediction and analysis. Transportation Research Record: Journal of the Transportation Research Board. 2020;2674(8): 145-159. doi: 10.1177/0361198120930014.
Claesen M, et al. Hyperparameter tuning in Python using Optunity. International Workshop on Technical Computing for Machine Learning and Mathematical Engineering (TCMM 2014), Leuven, Belgium. 2014. p. 1-2.
Freitas D, Guerreiro Lopes L, Morgado-Dias F. Particle swarm optimisation: A historical review up to the current developments. Entropy. 2020;22(3): 1-36. doi: 10.3390/e22030362.
Lundberg SM, Lee SI. A unified approach to interpreting model predictions. NIPS'17: Proceedings of the 31st International Conference on Neural Information Processing Systems, 4-9 Dec. 2017, Long Beach, USA. Red Hook: Curran Associates Inc.; 2017. p. 4765-4774.
Gianfagna L, Di Cecco A. Explainable AI with Python. Cham: Springer International Publishing; 2021.
Copyright (c) 2022 Delia Schösser, Jörn Schönberger
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).