ESTIMATING SIGNAL TIMING OF ACTUATED SIGNAL CONTROL USING PATTERN RECOGNITION UNDER CONNECTED VEHICLE ENVIRONMENT

The Signal Phase and Timing (SPaT) message is an important input for research and applications of Connected Vehicles (CVs). However, the actuated signal controllers are not able to directly give the SPaT information since the SPaT is influenced by both signal control logic and real-time traffic demand. This study elaborates an estimation method which is proposed according to the idea that an actuated signal controller would provide similar signal timing for similar traffic states. Thus, the quantitative description of traffic states is important. The traffic flow at each approaching lane has been compared to fluids. The state of fluids can be indicated by state parameters, e.g. speed or height, and its energy, which includes kinetic energy and potential energy. Similar to the fluids, this paper has proposed an energy model for traffic flow, and it has also added the queue length as an additional state parameter. Based on that, the traffic state of intersections can be descripted. Then, a pattern recognition algorithm was developed to identify the most similar historical states and also their corresponding SPaTs, whose average is the estimated SPaT of this second. The result shows that the average error is 3.1 seconds.


INTRODUCTION
The intersection is the main type of bottlenecks for traffic networks. The bottleneck can cause high emissions, low efficiency, or even a safety problem. New technologies, such as Connected Vehicles (CVs) and intelligent traffic system provide opportunities to improve the efficiency of intersections. Previous studies of CVs found that the technologies could improve the performance of intersections by decreasing delay [1] and reducing oil consumption as well as emission [2]. Additionally, these new technologies can be employed to reduce delay by optimizing the signal timing or other traffic management strategies [3][4][5][6][7]. One of the necessary inputs of the above studies is the Signal Phase and Timing (SPaT) message, which includes all phases of this cycle, the length of these phases, and the gap between these phases. SPaT can be easily obtained from the fixed-timing signal controllers. However, obtaining SPaT for actuated signal controllers is difficult since their signal timing is not only related to the control logic but also to the real-time traffic conditions. On the other hand, CVs could provide vehicle speed and location information in real time. For one signal controller, if the control logic is not changed, the signal controller will give similar signal timing under similar traffic states, which is more easily and more possible to be obtained by using the CV data. Consequently, how the CV data can be used to estimate the signal timing of the actuated signal controllers has become an important topic.
There have been some studies on obtaining SPaT; however, almost all of them focused on the fixed-timing signal control [8][9][10][11][12][13]. Based on data sources, these studies can be divided into two categories. In the first category, data are collected from invasive detectors, such as loop detectors [8,11]. Firstly, the delay patterns are measured by travel time. Key vehicles are identified by utilizing the traffic state such as queue length, volume, and individual state information [17,18]. CVs could be a possible solution to collect data of the traffic state for the actuated signals in real time.
The aim of this study is to estimate the length of the current phase of the actuated signal controller. Unlike the existing studies that use critical vehicles or historical signal timings to estimate the signal timing, the algorithm proposed in this paper is based on real-time traffic states. The traffic states are obtained from the information from CVs. The signal timings of adaptive and actuated control are changed with the traffic state. This paper moves the focus from signal timing to traffic state, which would be a meaningful exploration for further studies on estimating the signal timings of the actuated or adaptive control. This paper tries to use the traffic state to estimate the signal timing without estimating the control logic. The main idea of this paper is that the actuated signal should provide similar timing schedule if the traffic condition is the same. To test this hypothesis, historical traffic state data and its corresponding signal timing were collected. The most similar historical traffic states are then selected, and the current phase is estimated based on the signal timing of the selected historical state. The proposed algorithm is also located in cloud and will send the result of every second to road-side units or target vehicles in time.
The organization structure of this paper is as follows. The second Section introduces the definition of the traffic state and the traffic state extraction algorithm. The third Section describes the signal estimation algorithm. The fourth Section discusses the simulation evaluation. Finally, the fifth Section summarizes the conclusions.

STATE EXTRACTION
The main idea of this study is to estimate the current phase duration in real time by recognizing similar historical traffic states. The algorithm flowchart is shown in Figure 1. The traffic state will be extracted at each time step and stored in the database. Then, the algorithm will find similar historical states which have the same phase and similar passed green time in the database. The phase time of the current state will be calculated by using the phase times of these similar historical states. This study requires the estimation as an online algorithm, which requires a lighter calculation burden. As there is a huge amount of historical states, the traffic state delay patterns, and then they are used to estimate the signal timing information [11]. Key vehicles include the first and the last vehicle that pass the stop line in one signal control cycle. The main problem of this method is that there might be more than one cycle between two key vehicles. Although some studies have already applied mathematical iteration to correct the cycle length, these studies were based on using the engineering experience [8].
The second group of research uses the trajectory data to estimate the signal timing of the fixed-time signal controllers [9,10,12,13]. Some researchers identify the key vehicles by stop event and delay, and the cycle is modified by using the maximum common divisor of the gap between critical vehicles. The GPS data of mobile phones can also be used to estimate the signal timing. By identifying the stop and launch time of vehicles, the cycle and length of each phase is identified according to the traffic wave theory [9]. Since the penetration of trajectory data would be low, the laws of kinematics were used to identify the starting time of the green and red lights [10]. The green time and cycle length estimation errors of the above procedures are between three and six seconds.
The actuated signal control is a common control type for intersections. However, the study seldom focuses on estimating the signal timing of the actuated signal controllers. The reason lies in the fact that the signal timing of actuated signal is strongly related to real-time traffic state [14,15], while the real-time traffic state is hard to perceive. The existing study mainly estimates the length of the current phase with the method of probability theory based on the historical duration time instead of the historical traffic state [16]. This estimation method is only useful to the intersections whose signal controller has simple control logic and can only change the signal timing slightly. However, this method might not be appropriate to be used at the intersections whose traffic demand changes significantly and then results in the substantial change of signal timing. The reason of no real-time-traffic-state based actuated signal timing estimation is that the traffic state is hard to be extracted in detail and in time.
Loop detectors can only get the state when a vehicle is passing it. Real-time queue length is hard to obtain, not to mention the real-time state of every vehicle. Traffic camera can be used to get the real-time queue length, but at a high cost. Trajectory data provided by CV have been used to estimated energy of the fluids, which is the sum of potential and kinetic energy. The total energy, however, often corresponds to multiple possible states, as shown in Figure 2. State 1 has high kinetic energy and low potential energy, while Stage 2 has low kinetic energy and high potential energy. There is no doubt that State 1 and State 2 are two different states, but they have the same total energy. In order to determine the unique state of fluids, a state parameter is needed, such as the current height or the flow speed. As for the traffic flow, the queue length is selected as the of an intersection represented by several vectors is helpful to achieve this target. Furthermore, the number of vectors that indicate the traffic state of an intersection should be as small as possible. The state of an intersection is a set of the state of each approaching lane. The vehicles in one approaching lane can be seen as a traffic flow. The traffic flow has been researched by analogy with fluids. The state of fluids can be described by energy and a state parameter. The energy includes the potential energy and kinetic energy, while the state parameter can be the speed or the height of the fluids. Similarly, the state of the traffic flow can also be represented by energy and a state parameter. This section presents the energy by analogy with the concept of energy of fluids, enabling it to determine a certain state through a state parameter. Two vectors, composed of these two values, are used to represent the traffic state of the intersection. The relative notations are shown in Table 1.
In the traffic field, the traffic flow is often compared to fluids. Determining an exact state of fluids requires only two values. The first is the total  Figure 2 -Energy uniqueness fluids move to high ground, the kinetic energy of water transfers to potential energy: the speed and kinetic energy of the fluids decrease while the potential energy increases. Similarly, the energy transfer also exists in the intersection. For a special case in which only one vehicle drives in the intersection and the vehicle cannot pass the intersection without stopping, the vehicle has maximum kinetic energy when it just entered the range of the intersection; however, no large pressure on the intersection controller is observed at this moment, as there is no urgent need to release it to pass the intersection immediately. By contrast, the speed of the vehicle decreases when it approaches the stop line, which causes the control pressure of the intersection controller increasing sharply, just as shown in Figure 3. The intersection likes a "valve" in the traffic flow, state parameter. The queue length is one of the most important performance measures of an intersection [21]. The queue length is also often selected as the control objective of the signal timing optimization [22]. Only two numbers can indicate the state of the fluids. As for the traffic field, if the state of the traffic flow of one approaching lane can be presented by two numbers, the traffic state of the intersection can be presented by two vectors which is composed of the energy and a state parameter of each approaching lane. After using two vectors to represent the traffic state of one intersection, there is no doubt that the efficiency and practicality of the pattern recognition algorithm will be greatly increased. The energy of the traffic flow, which is similar to that of the fluids, is composed of two components in an approaching lane of the intersection. When the where v j 0 represents the average speed at the moment when a vehicle is just entering the control boundary of lane j; and L j represents the distance from the control boundary to the stop line. These expressions are verified, but only if the traffic flow includes only one vehicle in undisturbed environment. When the flow is large, the kinetic energy of the vehicles cannot be completely converted into potential energy due to the influence of other vehicles, and the vehicles may stop far from the stop line. A compensation factor for the potential energy, therefore, should be evaluated. As a result, As mentioned at the beginning of this section, the total energy is not sufficient to determine one state; therefore, a state parameter is needed to identify a specific state. In this study, the number of stopped vehicles is selected as this parameter because it not only represents the most urgent control pressures but is also easily obtained and has high accuracy. The traffic state of the intersection can be represented by a matrix shown in Equation 9: just similar to the fluids brings pressure to a shutting valve, the coming of the traffic flow also brings pressure to the intersection. After the vehicle reaches the stop line, the kinetic energy drops to zero and, therefore, the control pressure of the intersection controller is at its maximum value. To be consistent with the fluids, the control pressure of traffic flow is also named potential energy.

S2
Low speed High press to signal control S1 High speed Less press to signal control S1 S2 For the quantification and determination of potential energy, this study defines the intersection control boundary line as the zero potential energy surface. According to its physical definition, the potential energy is the product of mass and distance at the control boundary. It can be expressed by where e j P denotes the potential energy of the traffic flow on lane j; k j is the coefficient, and h j is the average distance to the control boundary of vehicles on lane j. Same with e j K , the unit of e j P is also vm · J, where vm is the mass of a vehicle.
Back to the special case in which only one vehicle passes the intersection and the vehicle cannot pass the intersection without stopping;, since there is no influence of other vehicles, the study consid-vehicle situation. Calculate afterwards the Euclidean distance between ρ(E 1 ,E 2 ) of the current state and the historical traffic states to identify the states with similar energy distribution.
In order to obtain a more stable result, the five most similar situations are selected and the preliminary estimated green time which is indicated by g is estimated by averaging the green time of the selected states.
The detailed steps are summarized as follows: Step 1: Picking the historical states that have the same green phase with the current state.
Step 2: Screening historical states whose passed green time, which is indicated by t d , is similar to that of the current state. If the difference between them is less than three seconds, the state will be chosen. In the actuated signal control logic, the minimum green light extension time is equal to maximum allowable headway, which is chosen as three seconds in the Signal Timing Manual [19,20].
Step 3: Calculating ρ(SV c ,SV s ), where SV c is the state parameter vector of the current state and SV s is the state parameter vector of the states identified by Step 2. This study chose J (the number of lanes in an intersection) as a threshold. It is assumed that the average difference of motionless vehicles for each lane is not larger than one. The estimation result is sensitive to the queue length. The actuated control is based on the information of a loop detector. One vehicle can cause the loop detector to be occupied or not. Two motionless vehicles for each lane can lead to large errors in the test. This value can be adjusted if other detectors are used such as traffic cameras.
Step 4: Calculating ρ(E c ,E s ), where E c is the energy vector of the current state and E s is the energy vector of states identified by Step 3. The threshold is M, which is calibrated experimentally. The specific results are shown in the next section.
Step 5: Selecting the five newest historical states from the states selected in Step 4. If only the newest state is selected, there may be large errors in the estimation results. However, since the control logic may be adjusted with time, too many selected states may lead to wrong results. According to the simulation test, five newest states are most suitable. This value can be adjusted according to the actual situation.
Step 6: Averaging the green times of five selected states.
where s represents the current phase state; t d is the passed green time of the current phase; E and SV are two state vectors, as shown in Equations where J represents the number of lanes that belong to the approaching links of the intersection. The state vector E represents a vector, composed of the energy values of each lane. SV is a state parameter vector, which is represented by the number of stopped vehicles on each lane.

ESTIMATION ALGORITHM
This Section describes an algorithm that estimates the green time of the current phase by employing the energy and the number of motionless vehicles. The main idea of this algorithm is comparing the actual state with historical states in order to identify the most similar one and gets the estimated green time of the current phase based on it. This algorithm is based on the fact that if the control logic of an actuated signal controller does not change, the controller will show similar green times under similar traffic flow states.
The pattern recognition, such as image recognition in which the Euclidean distance is applied, is used to find the most similar historical traffic state. In this study, the traffic state of the intersection is represented by two vectors, the energy vector and the state parameter vector. The Euclidean distance calculation is thereafter employed with these vectors, as in image recognition. As the measure value and the meaning of the two vectors are different, a unified Euclidean distance cannot be calculated, and the recognition process needs to be divided into two steps. The Euclidean distance of the two vectors will be calculated respectively in the two steps, and only the state whose two Euclidean distances are both short is a similar state. This study uses ρ(SV 1 ,SV 2 ) to indicate the Euclidean distance between the state parameter vectors of the compared states, and ρ(E 1 ,E 2 ) is used to indicate the Euclidean distance between the energy parameter vectors of the compared states. As ρ(SV 1 ,SV 2 ) is smaller and much easier to calculate than ρ(E 1 ,E 2 ), the calculation process is designed as follows: obtain the Euclidean distance between ρ(SV 1 ,SV 2 ) of the current traffic state and the historical states to screen out the states that have a similar motionless

Parameter calibration
In order to calibrate the parameters, different coefficients k j 1 and thresholds M were tested in order to reach better accuracy. Table 2 shows that the Average Error (AE) is minimum when k j 1 equals 3 and M is 3E s , where E s indicates the total energy of the vehicle in the special case which passes the intersection with a stop and is not influenced by other vehicles.

Observation of state estimation parameters
In this part, four states are compared to verify the rationality of selecting energy and the number of stopped vehicles to represent the state of traffic flow. The simulated traffic is shown in Figure 4, and the difference of energy between the state (a) and the others are calculated. The energy of each lane is calculated by Equation 8, where k j 1 is set as 3. The energy difference is the sum of absolute values of the energy difference of each lane. The result is shown in Table 3, and it can be noticed that the variations are increasing. Table 3 is increasing for each scenario. It is worth noting that the difference between the Step 7: Averaging the estimated results of each passed second of the current phase.

SIMULATION
This Section focuses on building a simulation environment for testing the algorithm. Its accuracy is evaluated, and the influence of the length of green time, training data size, and the penetration of connected vehicles are also discussed respectively.

Simulation network
A simulation environment was built in VISSIM 9, and the algorithm is implemented in Python. An artery-branch intersection, controlled by the actuated controller was established to evaluate the algorithm. The artery road is composed of two-way four lanes, and their traffic volume is 700 pcu/(h•lane) while the branch is made of two-way two lanes, and their volume is 400 pcu/(h•lane). The simulation was run for ten hours to collect the traffic data including the Basic Safety Message (BSM) and SPaT as the basic historical database, including vehicle location information, speed information, current green light phase, passed green time, and green time.  The training data size has a significant influence on the estimation algorithm. The average absolute error of the results estimated based on one hour's data and ten hours' data is 4.47 seconds and 3.1 seconds, respectively. This corresponds to a 30% lower error. The larger training data size can increase the accuracy of the algorithm. As for the penetration of the connected vehicles, it can also influence the algorithm accuracy. The estimated results under different penetrations of the connected vehicles are shown in Table 4. When the penetration of CV is higher than 80%, the absolute estimated error is less than 4 seconds. energy of state (b) and (c) is relatively low; they have similar distance to state (a). However, Figures  4b and 4c seem to be very different from each other, indicating that the energy polymorphism needs to be calibrated by state parameters. After considering the number of motionless vehicles, (b) is closer to (a) than (c), which indicates the importance of the selection of these two parameters.

MODEL EVALUATION
Under the condition that CV penetration is 100% and 10 hours of data are employed as the training data, different green time lengths are selected to evaluate the algorithm, and the mean absolute estimated error is 3.1 seconds. The estimated results for different green time and the error distribution are shown in Figure 5. Most of the errors are between -1 s to 1 s. The largest estimated error is less than 6 s.
In one phase, with the increase of the passed green time, the estimation results become more accurate. The estimation results of two phases with different green time are shown in Figure 6 as examples. Different volumes are tested which causes a big large change of green time. The range of tested   true phase time was 3.1 seconds. Then, the influence of penetration of CVs, green time length, and the training sample size on estimation accuracy were tested. The algorithm is adaptive to a large green time length range and becomes more accurate when the phase time approaches its ending. Furthermore, higher penetration and a large training sample size can result in a lower error. There are some limitations in this study. The method proposed does not consider the combination of the detector data, which is frequently used to obtain the traffic state. The inclusion of the detector data might further improve the accuracy of this study. The signal timing estimation of the arterial should also be an interesting topic.

CONCLUSION
This study proposed a cloud-based method to estimate the duration time of the current phase of the actuated signal controller by finding similar historical traffic states under the CVs environment. The traffic states of an intersection are composed of the traffic state of each approaching lane. Compared to fluids, the state of the traffic flow can be indicated by a state parameter and energy. The queue length is selected as the state parameter, and the energy is the sum of potential and kinetic energy.
The case study validates the accuracy and adaptability of the proposed model on different phase lengths from 11 to 50 seconds. The average absolute difference between the estimated and the