EFFECTS OF INDIVIDUAL DIFFERENCES ON MEASUREMENTS’ DROWSINESS-DETECTION PERFORMANCE

Individual differences (IDs) may reduce the detection-accuracy of drowsiness-driving by influencing measurements’ drowsiness-detection performance (MDDP). The purpose of this paper is to propose a model that can quantify the effects of IDs on MDDP and find measurements with less impact by IDs to build drowsiness-detection models. Through field experiments, drivers’ naturalistic driving data and subjective-drowsiness levels were collected, and drowsiness-related measurements were calculated using the double-layer sliding time window. In the model, MDDP was represented by |Z-statistics| of the Wilcoxon-test. First, the individual driver’s measurements were analysed by Wilcoxon-test. Next, drivers were combined in pairs, measurements of paired-driver combinations were analysed by Wilcoxon-test, and measurement’s IDs of paired-driver combinations were calculated. Finally, linear regression was used to fit the measurements’ IDs and changes of MDDP that equalled the individual driver’s |Z-statistics| minus the paired-driver combination’s |Z-statistics|, and the slope’s absolute value (|k|) indicated the effects of ID on the MDDP. As a result, |k| of the mean of the percentage of eyelid closure (MPECL) is the lowest (4.95), which illustrates MPECL is the least affected by IDs. The results contribute to the measurement selection of drowsiness-detection models considering IDs.


INTRODUCTION
Drowsiness driving is still a widespread traffic problem and always leads to serious injuries [1]. Studies pointed out that about 15-30% of all crashes were contributed by drowsiness driving [2]. According to the government report of America, drowsiness-related crashes caused approximately 1550 deaths and 12.5 billion dollars of losses each year [3]. Unlike drunk driving, due to the lack of objective criteria of drowsiness occurrence, many drowsiness-related accidents have not been reported, and the actual harm of drowsiness driving might be more serious. Thus, studies of drowsiness driving have always been a hotspot in traffic safety research.
Anti-drowsiness driving assistance systems can warn drivers when they are drowsy, which is considered to be an effective countermeasure for drowsiness-driving [4]. Anti-drowsiness driving assistance systems rely on accurate and reliable ing IDs, in which measurements that contain fewer IDs or were less affected by IDs are preferentially selected to build models [17]. For driver-specific models, Wang [7] used 23 non-intrusive measurements of individual drivers to establish the driver-specific drowsiness-detection model based on the multilevel logit model. The results showed that the drowsiness-detection accuracy of the personalised model considering IDs was higher. Chu et al. [18] recorded driving behaviour data through field experiments, using individual drivers' data, and driver-specific drowsiness-detection models were established based on the RBF neural network and support vector machine. The results showed that driver-specific models could improve drowsiness-detection accuracy by eliminating IDs. You [19] concentrated on IDs at driver's eye level and used eye landmarks of individual drivers to train the driver-specific drowsiness-detection model based on support vector machines. The accuracy reached 94.8%, which outperformed the models neglecting IDs. The above studies demonstrated that driver-specific drowsiness-detection models are more accurate at the individual driver level, which is attributed to using the individual driver's unique drowsiness-detection criteria. However, driver-specific drowsiness-detection models rely on the training of the individual driver's data to obtain the driver's unique drowsiness-threshold, which cannot accurately detect a new driver's drowsiness. Consequently, the practicalities and generalisations of driver-specific models are weak and drowsiness-detection efficiencies are low.
To address the shortcomings of driver-specific drowsiness-detection models, some researchers explored the method to deal with IDs in generalised drowsiness-detection models. Generalised models considering IDs made a balance between reducing the impact of IDs and improving the practicalities of drowsiness-detection. On the one hand, IDs in measurement distribution are analysed, and measurements with small IDs are chosen to build models for reducing the impact of IDs on drowsiness-detection accuracy. On the other hand, the generalised model only needs to be trained once using all drivers' data, which does not need to respectively utilise the data of each driver to train multiple driver-specific models. All drivers can use the generalised model to detect drowsiness. Therefore, the practicalities and efficiencies of drowsiness-identification models are improved.
drowsiness-detection models that have been widely studied [5,6]. And according to the intrusiveness of data collection, drowsiness-detection models can be divided into two types: intrusive and non-intrusive [7]. Although intrusive models using data like electroencephalograms [8], electromyography [4] have good accuracy, these are only available in laboratory conditions [7]. As opposed to that, the non-intrusive models based on driving behaviour data [5,9] or eye movement data [10] are more practical because the data collection process causes little disturbance to drivers and fewer restrictions [11]. Thus, non-intrusive drowsiness-detection models have become the interest of traffic safety research.
However, the accuracy and reliability of non-intrusive drowsiness-detection models are significantly affected by individual differences (IDs) [6]. IDs generally refer to differences among drivers in behavioural performance, physiological characteristics, and cognitive abilities, etc., which are crucial issues in current driving behaviour research [12][13][14]. To date, most non-intrusive drowsiness-detection models belong to generalised models that use the mixed measurement data of all drivers to train models and utilise the same drowsiness threshold to detect the drowsiness of each driver [1]. Many scholars have studied the mechanism of IDs affecting the drowsiness-identification of generalised models. Inger [15] pointed out that the drowsiness thresholds of generalised models were the average value of all drivers rather than the individual driver's threshold, and using the average threshold made the drowsiness-detection at the individual driver level suffer from systematic error. Similarly, Philip et al. [16] reported that there were major IDs in driving performance under drowsy state, and IDs might reduce the accuracy of drowsiness-identification based on average damage of drowsiness on driving performance. Yan [13] explored IDs in drowsiness-detection and found that IDs in measurement distribution among drivers could exceed differences caused by drowsiness, which decreased the correlation between measurements and drowsiness. These studies illustrated that it is necessary to consider IDs when establishing drowsiness-detection models.
Currently, the drowsiness-detection models considering IDs include two categories. One is the driver-specific drowsiness-detection model without IDs, which is trained using the driving behaviour data of the individual driver [7]. And the other is the generalised drowsiness-detection model consider-it is significant to analyse the effects of IDs on the MDDP when choosing measurements to build generalised drowsiness-detection models.
In the present paper, a new model was established to quantify the effect of IDs on the MDDP for finding measurements with fewer effects by IDs. The significance of the results is to provide a basis for generalised drowsiness-detection models to select measurements that are less affected by IDs. As a result, we can weaken the effects of IDs on the drowsiness-detection accuracy of generalised models and ensure the accuracy and efficiency of drowsiness-detection. In this paper, first the drivers' subjective-drowsiness levels and multi-source sensor data were collected through naturalistic driving experiments on the motorway. Then, a double-layer sliding time window was used to compute nine non-intrusive measurements. Next, the MDDP was evaluated by |Z-statistics| of the Wilcoxon-test and IDs of measurements were calculated. Then, changes of the MDDP with respect to IDs were analysed by linear regression, and the model to quantitively analyse the effect of IDs on MDDP was built. Finally, the effect of IDs on MDDP was quantified and verified. According to the results, measurements whose drowsiness-detection performance is less affected by IDs could be selected to establish generalised drowsiness-detection models, which could ensure the average drowsiness-detection accuracy and improve practicalities of generalised drowsiness-detection models.
The remainder of this paper is organised as follows. The experimental details of data collection were presented in section 2. The formulas of selected measurements, data pre-processing, and the models to quantify the effects of IDs on the MDDP were introduced in Section 3. In Section 4, the quantitative effects of IDs on the MDDP, its verification analysis, and other relative results were displayed. Finally, this is followed by the discussion and conclusions in Section 5 and 6, respectively.

Participants
Forty professional drivers (including 5 female drivers) with proficient driving skills were recruited and numbered. The ages of drivers ranged from 34 years to 57 years (mean=46.83, SD=5.62). The years of driving experience ranged from 3 years to 32 years (mean=16.53, SD=6.10). They had good With regards to generalised drowsiness-detection models considering IDs, effects of IDs on the measurements' drowsiness-detection performance (MDDP) are the major factors influencing the accuracy of drowsiness-detection [13]. Thus, some scholars explored IDs in drowsiness measurements of generalised models [17,20,21]. The obvious IDs in the distribution of measurements, such as the standard deviation of steering wheel movement [12], percentage of eyelid closure (PERCLOS), gaze time ratio [19], steering reversal rate (SRR), and standard deviation of lane position (SDLP) [21] have been frequently mentioned in previous studies. And scholars have compared the magnitude of IDs in various measurements. Xu, et al. [17] calculated 23 non-intrusive measurements using simulated driving experimental data and calculated IDs of measurements based on the Kruskal-Wallis test. It was found that IDs of PERCLOS and measurements derived from the steering wheel angle were higher and lower, respectively. Furthermore, the researchers analysed the impact of IDs on the MDDP. Niu [20] used the F-statistics of variance analysis to represent the MDDP and studied changes of the MDDP with IDs by comparing the individual driver's F-statistics with the F-statistics of the paired driver combinations. The results indicated that IDs weakened the MDDP. Zhang [21] calculated the Pearson correlation coefficients of SDLP, SRR, and drowsiness respectively for each participant and pointed out that IDs reduced the correlation between measurements and drowsiness levels. The above results could provide significant basis for developing generalised drowsiness-detection considering IDs.
Previous studies proved that IDs of measurements existed and found that the IDs of various measurements were different. However, the quantitative analysis on the effect of IDs on MDDP has not been reported in current studies, which is crucial for the generalised drowsiness-detection models to choose measurements less affected by IDs. In previous generalised drowsiness-detection models without considering the effect of IDs on the MDDP, the drowsiness measurements whose drowsiness-detection abilities were greatly affected by IDs were chosen to train models, which caused the IDs to sharply reduce the drowsiness-detection accuracy. The prerequisite of improving the drowsiness-detection accuracies of the models is to select measurements that can distinguish awake state and fatigue state effectively, that is, the MDDP is ideal [22]. Therefore, traffic volume, was chosen as the experimental road. The experimental routes and sensors were shown in Figure 1. The high definition cameras collected the outside driving environment, the driver's facial features, and driving manipulation. The Mobileye gathered distances from the vehicle to the lane line. The driver state sensor gathered eye movement data. The inertial navigation equipment recorded velocity, acceleration, and GPS. Besides, we also used materials such as the participant's demographic information scale and the KSS, etc.

Experimental procedure
Before the experiments, the participants spent at least 10 minutes to familiarise themselves with the experimental vehicle. During the experiments, the participants departed the Fuhe Toll Station, entered the Suizhou Service Area after 2 hours, and spent one hour to have meals and rest. Then, participants drove to the Xiangyang North Toll Station and turned back to the starting point. One experiment took about 6 hours which did not include the rest time, and the total distance was about 600 kilometres. Participants were required to drive at approximately 100 km/h. To avoid interference with driving drowsiness, participants were not allowed to listen to songs and nor to communicate. Experiments were conducted during the non-traffic peak period. During the experiments, the traffic environment was driving skills and stable driving habits. The participants were in a good psychological and physiological state, with normal work and rest, and did not take any drugs for 3 weeks before experiments. Every participant signed the informed consent and was told the possible experimental risks and details before the experiments. Each experiment was accompanied by a safety officer who was an instructor with 30 years of driving experience. After training, the safety officer was responsible for inquiring and recording the subjective-drowsiness level of participants using the Karolinska Sleepiness Scale (KSS) and taking emergency measures to ensure safety when necessary.

Apparatus
Participants drove a real vehicle that was integrated with on-board sensors including the threeway HD camera, the Mobileye, the Driver State Sensor (the sampling frequency is 60 HZ), the inertial navigation equipment, and the steering wheel angle sensor (the sampling frequency is more than 20 HZ), etc. because turning behaviour caused by corners and traffic congestions can interfere in the study of the correlation between drowsiness and steering measurements. In order to avoid the influence of road alignment and other factors on driving behaviour, the Hanshi Motorway, which has straight road alignment, better road surface, and less  where N thr is the number of sampling points exceeding the threshold and N all is the total number of angle sampling points in the time window.
④VSA eliminates the effect of road curvature on the steering wheel angle [23] and reflects the relationship between drowsiness and steering wheel angle.
where MSWA and SDSWA are the mean value and standard deviation of the steering wheel angle in the time window, respectively.
PERCLOS is the standard measure for drowsiness-detection [10]. The ⑨SDPE describes the variation of PERCLOS in the time window, which can reflect the blink frequency of drivers.
where PE i is the PERCLOS of the driver at sampling point i, and n pe is the total number of PER-CLOS sampling points in the time window.

Data pre-processing
The time window to calculate the measurement is considered sensitive to drowsiness-detection [21]. According to reference [24], the double-layer sliding time window was proposed to compute measurements. And due to a monotonous environment, long duration of driving, and high speed, drowsiness-driving in the continuous driving scenario on the motorway is more dangerous [7]. Therefore, we focused on the continuous driving relatively simple and there was little traffic congestion due to the low traffic volume. Besides, the traffic conditions outside the vehicle were recorded by the camera in real-time. Therefore, some data under abnormal traffic scenes could be deleted based on videos outside the car. The safety officer recorded the participant's self-report KSS every 5 minutes, and after the experiments, the final KSS was determined by considering the videos of the participant's face and the experts' opinions. Participants received certain compensation after finishing the whole experiment.
Due to equipment failure and other reasons, experimental data of 5 participants (No.10, 13, 17, 25, 32) were not collected. Finally, the investigators obtained valid original experimental data of 35 participants. The participants with invalid experimental data were excluded in the following analysis.

Drowsiness measurements
According to references [1,10,11,21], nine non-intrusive drowsiness-driving measurements were chosen. These measurements were computed using original data from various sensors, which reflects the impact of drowsiness on the driver's physiological characteristics, driving operations, and vehicle movement. The information on non-intrusive measurements was presented in Table 1.
The calculation formulas of some complicated measurements were as follows.
②SRR reflects the stability of the steering wheel control. Referring to research studies on drowsiness-driving in a real vehicle environment [21], 6º was chosen as the threshold. Chose measurement samples of consecutive driving data on the motorway. According to the external videos captured by the HD camera, the driving duration of the continuous driving scenario on motorways was extracted. Through statistical analysis of the vehicle speed of these driving durations of the continuous driving scenario on motorways, and according to the reference [21], we chose T 1 with the minimum speed above 80 km/h as the sample in consecutive driving data on the motorway.

Quantitative model building
As shown in Figure 3, this paper proposed a model based on the Wilcoxon-test (W-test) and simple linear regression to quantitatively analyse the effect of IDs on the drowsiness-detection performance of measurements. Nine non-intrusive drowsiness-driving measurements were input into the model one by one to calculate the effect of IDs on their drowsiness-detection performance.
In Part A, based on the result of the Wilcoxon-test for the individual driver, paired-driver combinations having various IDs between two drivers' measurements were formed. We made the Wilcoxon-test scene on the motorway. The pre-processing flow of setting time window and selecting continuous driving data is shown in Figure 2.
First of all, multi-source data were synchonised according to the timestamp. Because the driver's self-report drowsiness level was recorded every 5 minutes, in order to avoid sample crossing of two KSSs, the original data were divided into samples within 5 minutes. For each data sample, if the captured videos have demonstrated that the drivers' behaviour is interrupted by any traffic incidents, this sample was excluded.
Set double-layer sliding time window. It is found that in a real vehicle environment, the drowsiness-driving state generally lasts 15-75 seconds, while the duration of typical drowsiness operation characteristics is generally 5-20 seconds [24]. In order to avoid the calculation of the average effect covering the drowsiness operating characteristics, the double-layer sliding time window was proposed to calculate measurements. Firstly, the first-layer sliding step (S 1 ) and time window (T 1 ) were set within every 5 minutes. The data in T 1 were a sample that was a fundamental unit for obtaining measurements and detecting drowsiness driving. Secondly, within each T 1 , the second-layer sliding step (S 2 ) and the time window (T 2 ) were set. The data in T 2 were used to calculate measurements, the maximum measurement for all T 2 in every T 1 was chosen as the final measurement value of the sample (T 1 ). For instance, when calculating   R i is the rank of the measurement sample S i of the sober driving state in mixed samples, R j is the rank of the measurement sample F j of the drowsy driving state in mixed samples.

Calculation of IDs of measurement
We proposed the following formulas to calculate the comprehensive IDs between the drivers' measurements.
on the measurement samples of individual drivers and obtained the P-value (P o ) and Z-statistics (Z o ) of the Wilcoxon-test. Then, drivers were arranged in descending order according to their |Z o |, the previous qv drivers were respectively chosen as the benchmark driver (qv needed to be optimised, which will be introduced in Formula 7). Next, the benchmark driver was combined with the other driver whose |Z o | is smaller than that of the benchmark driver to form some combinations including two drivers. Finally, mixed measurement samples of the paired-driver combinations were obtained by mixing measurement samples of the two drivers. In part B, firstly, the Wilcoxon-test was performed on the paired-driver combination's mixed measurement samples and the combination's Z-statistics (Z t ) was obtained. ∆|Z| of each paired-driver combination equalled to |Z ob | of the benchmark driver minus |Z t | of this paired-drivers combination, which represented the change of the MDDP. Secondly, the measurement's comprehensive IDs (D) between the two drivers in the combination were computed using Formula 5. Finally, simple linear regression was used to fit D (independent variable) and ∆|Z| (dependent variable), and the absolute value of the slope of the fitted line indicated the effect degree of IDs on the MDDP.

Wilcoxon-test
The Wilcoxon-test is usually used to analyse the differences between the unpaired samples from the two groups and the data need not follow a normal distribution [25]. The bigger the |Z-statistics|, the greater the difference between the measurements of the sober state and the drowsy state. P-value<0.05 means  Through iterating, we obtained the optimal parameter values that maximise h namely, λ=0.6, η=0.1, α=0.1, β=0.2, qv=3.

The Wilcoxon-test for individual participants
For individual participants, according to KSS, the measurement samples were divided into two groups: the sober state and the drowsy state. According to the literature [7], measurement samples with KSS≤3 belonged to the sober driving state, and measurement samples with KSS≥6 belonged to the drowsy driving state. The Wilcoxon-test was used to analyse the differences between the measurement samples in the sober state and the drowsy state.
The design of the time window has an important effect on the drowsiness-detection performance of the measurement [20,24]. This paper aims to guide to select measurements for the generalised drowsiness-detection model. The optimal time window should enable as many drivers as possible to use the measurement to detect drowsiness. Thus, the Wilcoxon-test was performed on each participant's measurement and optimised time windows to maximise the number of participants whose P o <0.05. We designed the range of time windows [20,24] and the best time windows for each measurement were acquired by iterating. The optimal time window setting and Wilcoxon-test outcomes were shown in In the paired-driver combinations, each driver's measurement is a group sample. ICC s and ICC f are the intra-class correlation coefficient of the sober state and the drowsy state, respectively. SSA s and SSA f are the between-group variances of rank in the sober state and the drowsy state, respectively. SST s and SST f are the total variances of rank in the sober state and the drowsy state, respectively. M bs and M bf are the measurement median of benchmark driver in the sober state and the drowsy state, respectively. M rs and M rf are the measurement median of the other driver of a paired-driver combination in the sober state and the drowsy state, respectively. λ, η, α, β are weights of the four IDs source factors (ICC, |Z ob -Z or |, |M r -M b |, (|M 2s |+|M 2f |)/2).

Simple linear regression
The simple linear regression model was used to fit ∆|Z| and D, and the model was as follows: Formula 6 reflects the relationship between the change of the drowsiness-detection performance of the measurement and IDs. |k| indicates the effect degree of IDs on the drowsiness-detection performance of the measurement. The smaller |k|, the smaller the effect degree of IDs on the MDDP. The coefficient of determination (R 2 ) evaluates the fitting goodness of the linear model [26]. In this paper, R 2 is no less than 0.4, which means that the fitting function reflects the relationship between ∆|Z| and D.
In the model shown in Figure 3, the number of benchmark drivers (qv) and λ, η, α, β of Formula 5 affect the R 2 . For improving the fitting effect, the objective function was established to optimise λ, η, α, β, and qv. R i 2 was the R 2 of i-th measurement, which is determined by λ, η, α, β, qv. was more drastic in the drowsy state. It was indicated that drowsiness could decrease the stability and safety of steering operations for most participants, which was consistent with previous studies [5,6].
Comparing the data distribution of SDSWM among participants, it was found that there were visible IDs of SDSWM in both the sober state and the drowsy state. In the sober state, the median value of the participant No. 12 was the highest with a value of 0.55, and that of the participant No. 29 was the lowest with a value of 0.27. Furthermore, for participants No. 7,No. 8,No. 9,No. 12,No. 30,and No. 31, the median of SDSWM in the drowsy state was lower than that in the sober state. This was different from the general law of the relationship between SDSWM and drowsiness. However, the mechanism of this phenomenon was not clear, it might be related to the driver's personality, operating habits, alertness, and other attributes. This phenomenon reflected that there were also obvious differences in the trend of changes in measurements from sober driving to drowsy driving between participants.

Effects of IDs on the measurements' drowsiness-detection performance
For each measurement, according to the model in Figure 3, ∆|Z| and D for each paired-participant combination were calculated. Then the linear function was used to fit D (independent variable) and ∆|Z| (dependent variable) of all paired-participant combinations. We chose SDLP, SDSWM as In Figure 4, we took some measurements as examples and listed |Z o | of all participants. The hollow points are |Z o | of the participants whose P o ≥0.05. There were obvious differences among the participants' |Z o |. Taking MPECL as an example, the |Z o | of the participant No. 31 was the highest with a value of 29.54, which indicated that No. 31 was the participant who used MPECL to detect drowsiness most effectively. Whereas |Z o | of the participant No. 6 was the lowest minimum with a value of only 1.57. It was found that for different participants, the performances of the same measurement to detect drowsiness-driving were significantly different. It could be speculated that if the MPECL samples of participant No. 31 and No. 6 were mixed, compared with participant No. 31, the performance of MPECL to detect the drowsiness driving might be reduced for the mixed measurement data of No. 31 and No. 6. Besides, |Z o | of No. 36, No.3 7, and No. 39 was similar, which indicated that the drowsiness-detection performance of the MPECL of these participants was similar.

IDs in measurement distribution
To analyse the IDs in the distribution of the measurements between participants, we took SDSWM as an example and drew the box plot of SDSWM in the sober state and the drowsy state. In Figure 5, we only chose these participants whose P o <0.05. For most participants, the median of SDSWM in the drowsy state was bigger than that in the sober state and the change in steering wheel movement  Besides, it was found that ∆|Z|<0 when the IDs were low, which meant that the MDDP for paired-participant combinations increased compared to that for a benchmark participant. The linear fitting results of ∆|Z| and D for all measurements were summarised in Table 3. The measurements were arranged in descending order of |k|.
In Table 3, |k| of the MATV was the highest with a value of 6.98, which indicated that its drowsiness-detection performance decreased the most rapidly due to the influence of IDs, whereas |k| of MPECL was the lowest and its value was 4.95, indicating that its drowsiness-detection performance was the least influenced by IDs. Furthermore, it was examples to show the fitted results. In Figure 6, each point represents one paired-participant combination consisting of two participants, and the scattered points are approximately distributed on both sides of a straight line. Therefore, the linear function was used to fit D and ∆|Z|.
In Figure 6, ∆|Z|>0 means that compared to the MDDP for the benchmark participant, the MDDP for the paired-participant combinations decreased. The ∆|Z| and D were positively correlated. And as D increased, ∆|Z| increased to greater than zero. The result illustrated that IDs weaken the MDDP, and the greater the IDs, the greater the weakening degree in the MDDP. |k| was different, which illustrated that the effect degrees of IDs on the drowsiness-detection  that of MATV (7.37%), which verified that the effect degree of IDs on the drowsiness-detection performance of MPECL was smaller than that of MATV. When the IDs were 2.5, the drowsiness-detection accuracy of MPECL was by 11.1% higher than that of MATV. This illustrated that using measurements that were less affected by IDs could improve drowsiness-detection accuracies and adaptabilities of the generalised drowsiness-detection models.

DISCUSSIONS
According to the results and analyses above, some meaningful insights and contributions to the development of drowsiness-detection methods considering IDs were obtained. In sections 4.1 and 4.2, there are obvious IDs in distributions of measurements (see Figure 5), which could largely explain why the drowsiness-detection performance of the same measurement is different for different drivers found that compared to other categories of measurements, the effects of IDs on the drowsiness-detection performances of measurements derived from eye movement were lower.

The verification of results
The measurements whose drowsiness-detection performance was affected less by IDs were chosen to build models, the decrease in drowsiness-detection accuracy was small when the IDs increased. For verifying the outcomes in Table 3, the single measurement was used to build drowsiness-detection models based on the Fisher discriminant function. We took MPECL and MATV as examples, whose drowsiness-detection performance was the lowest and the most affected by IDs, respectively. Measurement data with different IDs were used to train models. By comparing the change in the drowsiness-detection accuracy with the increase of IDs, effects of IDs on the drowsiness-detection performance of measurements were verified. The drowsiness-detection accuracy of models based on MPECL and MATV is shown in Figure 7.
In Figure 7, when IDs were 0, the data of the driver with maximum |Z o | were used to train driver-specific drowsiness-detection models. For both MPECL and MATV, the accuracies of driver-specific models using the data of the driver with maximum |Z o | were the highest (MPECL=86.47%, MATV=79.81%). When adding data of other drivers, the IDs increased from 0 to 2.5, the models became generalised drowsiness-detection models, and the accuracies of models based on MPECL and MATV both decreased. But the decrease in drowsiness-detection accuracy of MPECL (2.93%) was less than  are stronger, which may lead to the drowsiness-detection performances of MPECL and SDPE being relatively less affected by IDs. In section 4.4, the accuracies of driver-specific models trained by individual driver's data are higher than that of generalised models trained by two drivers' data (see Figure 7), which is mentioned in previous studies [6]. However, the driver-specific drowsiness-identification model has weak generalisation. Thus, for improving the utilisation rate of models, it is still necessary to study a generalised drowsiness-detection model applicable to some drivers with low IDs. In order to decrease the negative influences of IDs on generalised models, this paper builds a model to find measurements less affected by IDs to train generalized models. In Figure 7, when the IDs are the same, the accuracies of generalised drowsiness-detection models based on measurements whose drowsiness-detection performance is less affected by the IDs are higher. Therefore, for effectively weakening the negative effects of IDs and improving accuracies and practicalities of drowsiness-detection, it is recommended to preferentially choose measurements in Table 3 that are less affected by IDs, such as MPECL and SDPE, to train generalised drowsiness-detection models.
These results can be applied to the development of commercial anti-drowsiness systems considering IDs. For instance, in freight corporations, firstly, managers calculate IDs in the measurement between the drivers to group the drivers with smaller IDs. And then they can use the measurements that are less affected by the IDs (see Table 3) to establish generalised drowsiness-detection models suitable for all drivers in the group. Furthermore, the generalised drowsiness-detection models considering IDs can be embedded in anti-drowsiness systems. In this way, on the premise of reducing the effect of IDs and ensuring certain drowsiness-detection accuracy, the utilisation rate of a model can be improved as much as possible to reduce model training costs. The accuracy of the generalised drowsiness-detection model using MPECL is 83.54% (see Figure 7), which is higher than that of generalised drowsiness-detection models (78.01%) without considering the effects of IDs on the MDDP in the previous research [5]. Moreover, the generalised drowsiness-detection models in this paper only use individual measurements. Consequently, integrating multiple measurements that are (see Figure 4). For different participants, the measurements with the strongest drowsiness-detection performance are different, thus, for the specific participant, it is more reasonable to choose measurements with the best drowsiness-detection performance of the participant to establish a drowsiness-detection model. Besides, it can be inferred that when the measurement data of two drivers with obvious individual differences in the measurement distribution are mixed, the differences of measurements between the sober state and drowsiness state may become less distinct, which weakens the drowsiness-detection performance of the measurements.
In section 4.3, the fitting lines in Figure 6 illustrate that IDs have obvious influences on the drowsiness-detection performance of the measurements. IDs in the mixed data of two drivers can weaken the drowsiness-detection performance of measurements when IDs are large enough, which is consistent with the previous study [20]. However, it is found that the drowsiness-detection performance of measurements increases rather than decreases when IDs are small, which is not mentioned in previous literature [20]. The reason may be that when the IDs are small, the effects of IDs on the drowsiness-detection performance of the measurement are limited. Meanwhile, the increasing sample size can provide more information and improve the MDDP. Therefore, the drivers with small IDs in the drowsiness-driving measurements should be grouped and the mixed data of the group are used to train a generalised drowsiness-detection model, which can relatively reduce the effect of IDs and improve the accuracy and utilisation rate of the model.
The most significant achievement of this paper is to propose a quantitative analysis model for the effect of IDs on the MDDP. Using the model, the effects of IDs on the drowsiness-detection performance of non-intrusive measurements were quantified (see Table 3). Compared to other categories of measurements, the drowsiness-detection performance of measurements derived from eye movement data is less affected by IDs. Previous studies have only pointed out that IDs of the PERCLOS during drowsiness are low [17], but they have not quantified the effect of IDs on the drowsiness-detection performance of the PERCLOS. The IDs of the measurements derived from eye movement data are low, and the correlations between PERCLOS and drowsiness China (2019YFB1600800); National Natural Science Foundation of China -Collaborative Foundation (U1764262); Hubei Innovation Group Project (2017CFA008. less affected by IDs to establish drowsiness-detection models may further improve the drowsiness-detection accuracy.

CONCLUSIONS
In the present work, through the field experiments on motorways, various naturalistic driving data were non-intrusively acquired. The double-layer sliding time windows were used to compute non-intrusive measurements of drowsiness driving. By iterating, the time windows for each measurement were optimised and nine non-intrusive measurements were computed. Then we proposed a method for calculating the comprehensive IDs in measurements and built a model to quantify the effect of IDs on MDDP.
The effects of IDs on the drowsiness-detection performance of measurements were quantified. The results illustrate that large IDs can weaken the drowsiness-detection performance of non-intrusive measurements. When the IDs become bigger, the drowsiness-detection performance of measurement decreases sharply. However, when the IDs are very small, the drowsiness-detection performance of measurement for mixed data may increase due to the increase of measurement sample size. Besides, compared with other types of non-intrusive measurements, the drowsiness-detection performance of measurements derived from eye movement is less affected by IDs.
The methods and conclusions can provide guidelines for the calculation of drowsiness measurements using naturalistic driving data, quantitative research on IDs, and the measurement selection of drowsiness-detection models considering IDs. There are also some limitations of this paper. For instance, the optimisation method and range of the IDs calculation weight coefficient are simple, which makes the linear fitting goodness of some measurements not high, the combination of participants is relatively simple, and the drowsiness assessment is relatively subjective, etc. In the future, the calculation coefficients of IDs values will be further optimised using advanced optimal algorithms. Besides, a more objective ground truth of drowsiness such as electroencephalograms will be adopted to improve the rationality of the research.