A Functional Data Approach to Outlier Detection and Imputation for Traffic Density Data on Urban Arterial Roads
Abstract
In traffic monitoring data analysis, the magnitude of traffic density plays an important role in determin-ing the level of traffic congestion. This study proposes a data imputation method for spatio-functional principal component analysis (s-FPCA) and unifies anomaly curve detection, outlier confirmation and imputation of traf-fic density at target intersections. Firstly, the detection of anomalous curves is performed based on the binary principal component scores obtained from the function-al data analysis, followed by the determination of the presence of outliers through threshold method. Secondly, an improved method for missing traffic data estimation based on upstream and downstream is proposed. Final-ly, a numerical study of the actual traffic density data is carried out, and the accuracy of s-FPCA for imputation is improved by 8.28%, 8.91% and 7.48%, respective-ly, when comparing to functional principal component analysis (FPCA) with daily traffic density data missing rates of 5%, 10% and 20%, proving the superiority of the method. This method can also be applied to the detection of outliers in traffic flow, imputation and other longitudi-nal data analysis with periodic fluctuations.
References
Arasan VT, Dhivya G. Methodology for Determination of Concentration of Heterogeneous Traffic. Journal of Transportation Systems Engineering and Information Technology. 2010;10(4). doi: 10.1016/S1570-6672(09)60052-0.
Ramsay JO, Silverman BW. Functional Data Analysis. New York: Springer; 2005.
Wang J-L, et al. Functional data analysis. Annual Review of Statistics and Its Application. 2016;3: 257-295. doi: 10.1146/annurev-statistics-041715-033624.
Chiou J-M. Dynamical functional prediction and classification, with application to traffic flow prediction. The Annals of Applied Statistics. 2012;6(4). doi: 10.1214/12-AOAS595.
Chiou J-M, et al. A functional data approach to missing value imputation and outlier detection for traffic flow data. Transportmetrica B: Transport Dynamics. 2014;2(2). doi: 10.1080/21680566.2014.892847.
Li PL, Chiou J-M. Functional clustering and missing value imputation of traffic flow trajectories. Transportmetrica B: Transport Dynamics. 2020;9(1). doi:10.1080/21680566.2020.1781706.
Mu W. Application of functional data anomaly detection in spectral data. Xiamen University, 2019.
Chen J. Improvement and application of abnormal value diagnosis method of functional data. Jiangxi University of Finance and Economics. 2020. doi: 10.27175/d.cnki. gjxcu. 2020.000382.
Hyndman RJ, Shang HL. Rainbow plots, bagplots, and boxplots for functional data. Journal of Computational and Graphical Statistics. 2010;19(1). doi: 10.1198/jcgs.2009.08158.
Mondal MA, Rehena Z. Road traffic outlier detection technique based on linear regression. Procedia Computer Science. 2020;171(C): 2537-2555. doi: 10.1016/j.procs.2020.04.276.
Pu J, et al. STLP-OD: Spatial and temporal label propagation for traffic outlier detection. IEEE Access. 2019;(7): 63036-63044. doi: 10.1109/ACCESS.2019.2916853.
Chen K, Zou Q. [A traffic flow anomaly mining method incorporating time-correlated factor curve fitting]. Computer Engineering and Design. 2013;34(07): 2561-2565. Chinese.
Lu M-W, et al. [A traffic data pre-processing method based on curve-fitting anomaly detection]. Database Professional Committee of China Computer Society; 2006. p. 642-646. Chinese.
Schafer JL, Graham JW. Missing data: Our view of the state of the art. Psychological Methods. 2002;7(2): 147-177. doi: 10.1037/1082-989X.7.2.147.
Collins LM, Schafer JL, Kam CM. A comparison of inclusive and restrictive strategies in modern missing data procedures. Psychological Methods. 2001;6(4): 330-351. doi: 10.1037/1082-989X.6.4.330.
Rubin DB. Multiple imputation for nonresponse in surveys. John Wiley & Sons; 2004. p. 81.
Booth DE. Analysis of incomplete multivariate data. Technometrics. 2000;42(2): 213-214 doi: 10.1080/00401706.2000.10486013.
Schlittgen R. Analysis of incomplete multivariate data. Computational Statistics and Data Analysis. 1999;30(4): 478-479. doi: 10.1016/S0167-9473(99)90025-7.
Beale EML, Little RJA. Missing values in multivariate analysis. Journal of the Royal Statistical Society: Series B (Methodological). 1975;37(1): 129-145. doi: 10.1111/j.2517-6161.1975.tb01037.x.
Laird NM. Missing data in longitudinal studies. Statistics in Medicine. 1988;7(1-2): 305-315. doi: 10.1002/sim.4780070131.
Little RJA. Modeling the drop-out mechanism in repeated-measures studies. Journal of the American Statistical Association. 1995;90(431): 1112-1121.
Little RJA, Rubin DB. Statistical analysis with missing data. John Wiley & Sons, Inc; 2002. doi: 10.1002/9781119013563.
Molenberghs G. Applied longitudinal analysis. Journal of the American Statistical Association. 2005;100(470). doi: 10.1198/jasa.2005.s24.
Nihan NL. Aid to determining freeway metering rates and detecting loop errors. Journal of Transportation Engineering. 1997;123(6): 454-458. doi: 10.1061/(ASCE)0733-947X(1997)123:6(454).
Chen C, et al. Detecting errors and imputing missing data for single-loop surveillance systems. Transportation Research Record. 2003;1855(1): 160-167. doi: 10.3141/1855-20.
Zhong M, Sharma S, Lingras P. Genetically designed models for accurate imputation of missing traffic counts. Transportation Research Record. 2004;1879(1): doi: 10.3141/1879-09.
Zhang W-B, et al. [Traffic flow data restoration model for road networks based on self-attention mechanism and graph self-encoder]. Transportation Systems Engineering and Information. 2021;21(04): 90-98. doi: 10.16097/j.cnki.1009-6744.2021.04.011. Chinese.
Lu W-Q, et al. [Lane level traffic flow data restoration algorithm based on tensor decomposition theory]. Journal of Jilin University (Engineering and Technology Edition). 2021;51(05): 1708-1715. doi: 10.13229/j.cnki.jdxbgxb20200535. Chinese.
Li L, et al. Estimation of missing values in heterogeneous traffic data: Application of multimodal deep learning model. Knowledge-Based Systems. 2020;194: 105592. doi: 10.1016/j.knosys.2020.105592.
Copyright (c) 2022 Bin TANG, Yao HU, Huan CHEN
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).