IDENTIFYING ORIGIN-DESTINATION TRIPS FROM GPS DATA – APPLICATION IN TRAVEL TIME RELIABILITY OF DEDICATED TRUCKS

The advancement of data collection technologies has brought an upsurge in GPS applications. For example, travel behaviour research has benefited from the integration of multiple sources of Global Positioning System (GPS) data. However, the effective use of such data is still impeded by the challenge in data processing. For instance, GPS data, despite providing detailed spatial movement information, do not label the starting and finishing points of a trip, especially for commercial trucks. Hence, there is a critical need to develop a trip identification method to effectively use the trajectory data provided by GPS without additional information. This paper focused on identifying trips from the raw GPS data. Specifically, a systematic method is proposed to extract trips on the basis of origin-destination (OD) pairs by using a 5-step procedure. An application was provided on estimating the performance of travel time reliability using three metrics based on the OD trips for each dedicated truck. The application showed that, in general, trucks on long-distance routes have less reliable travel times compared to trucks on short-distance routes. This paper provides an example of using GPS data, without further information, to study travel time for freight performance and similar needs of punctuality in logistics.


INTRODUCTION
Freight trucks transport goods typically from and to manufacturing plants, retail stores, and distribution centres through localities, which play an important role in serving the needs of today's sup-ply chain and contribute to urban safety, congestion, and pollution [1,2]. Freight trucks can operate in many modes, such as dedicated, regional, and overthe-road. A dedicated truck is operated in a way that is designated to pick up a load at one regular location and then deliver to a dedicated customer (i.e., another regular location). Unlike many route options for trucks running in the urban area, a dedicated truck, especially running long-distance between cities, is committed to only a few routes and operates on a predictable schedule. It is conceivable that this type is the simplest and the most identifiable mode of truck freight operations.
In recent years, there have been increasing concerns regarding the timeliness of delivery from shippers, contract carriers, and trucking business owners. Therefore, travel time reliability becomes one of the key measures of freight truck performance along intercity or interregional highway networks [3]. However, knowing the truck travel time reliability requires fairly extensive data collection. In fact, most traditional data collection methods, such as established surveys and travel diaries, cannot provide adequate information about truck movements and travel time. An emerging method, using the Global Positioning System (GPS) to monitor and manage the performance of the truck fleet, has gained increasingly more attention because it can obtain large amounts of long-term data while retaining great details at a little cost [4].
GPS data have reasonable accuracy in improving the traditional manual data collection for travel time and trip data [5,6]. A key issue of GPS data 4 minutes [12], and 5 minutes [13], as well as for destination stops such as 15 minutes [13], 30 minutes [14], and 120 minutes [8]. These studies are focused on distinguishing the intermediate stops from the destination stops [8,9].
There are also studies that have attempted to combine additional techniques to improve the above dwell time threshold algorithm. For example, it can be upgraded to more complicated multi-stage procedures, supplemented by thresholds corresponding to multiple factors such as spot speed [13], a truck heading angle [14], and trip length [9]. The use of other techniques such as Geographic Information System (GIS) [15] or map matching [16] also found effective improvement. For example, a typical way is to combine GPS data with GIS techniques to provide locations of freight facilities [17,18]. Recent practices have extended to the use of machine learning methods such as support vector machine [19] and entropy estimation [20]. Regardless, stop dwell time is the predominant determinant in trip extraction from the GPS data [4].
It should be noted that the more important goal of most GPS data studies is to apply the after-processing data to various traffic studies. A significant application of using GPS trajectory is on travel time and travel time reliability studies [10,[21][22][23][24]. For example, based on GPS trip data, McCormack and Hallenbeck [10,21] explored more robust travel time benchmarks and performance statistics related to the reliability of truck trips. Liao [22] derived corridor travel time from GPS data for truck and passenger car travel respectively and found that trucks on the U.S. and interstate highways have about an average of 10% longer travel time than cars. With trip data, these travel time reliability analyses can be carried out at different levels, such as segment or link [17], corridor [22], route [23], and network [24].
As it can be found, the studies based on truck GPS data in the literature are trucks running on a given facility (i.e., segment-based) in a given time frame (i.e., 15-minute aggregates). Unlike the rest of the literature, the uniqueness of this study is that the truck performance on travel time reliability is based on individual trucks over multiple days. In other words, no studies have been found to study a dedicated truck running on a route repeatedly over a period of time.
To achieve this goal, first, the GPS data are filtered and split into trips on the basis of origin-destination (OD) pairs. This part is critical because it provides a detailed and accurate data foundation for is that it gives no explicit indication of when a trip begins or ends and is often mixed with noisy and invalid sensor data. Thus, to quantify the level of reliability or the extent of variability in travel time, data purification and trip acquisition techniques are required to turn the continuous GPS data into endto-end trip data.
This study established a systematic procedure to purify and extract truck trip information from a massive GPS dataset, including target route determination, the trip ends (stops) identification, and trips chains split after GPS data cleaning. The truck trips are applicable to various traffic studies and fleet management. The methodology proposed in this paper contributes to both the existing body of literature and the practices of the traffic management on performance measures of long-haul dedicated freight trucks and other transport modes in need of punctuality.
The paper is organised as follows. Section 2 reviews the relevant studies in the literature. Section 3 defines the scope of the study, which is the trip for a dedicated truck. Section 4 and section 5 are the key parts that present the data and methodology in this study. As an application, section 6 showcases the truck performance measure in travel time reliability. Finally, section 7 concludes the study with contributions, limitations, and future work.

LITERATURE REVIEW
Great effort has been made to identify the trip ends automatically from tabular GPS data. In previous studies, the most common way to break the continuous GPS data into trip-based data segments is to search stop dwells [7]. Depending on different GPS settings, a valid stop dwell can be defined by 1) speeds that are close to zero (e.g. less than 5 km/h) over a predefined time in the ongoing GPS recording; or 2) a gap between two consecutive timestamps greater than a predefined time due to signal loss or device outage [8]. Either way, it is critical to determine the dwell time threshold as a gap between any two trips embedded in the GPS data.
The selection of the threshold value for the dwell time that works in all cases is not easy [9] either starts from the origin and heads to its destination (O-to-D) or starts from the destination and heads to its origin (D-to-O). In practice, the origin and destination are like two local zones, as the exact starting and ending locations of each trip may be slightly different. This is because when a truck arrives at the local area, (1) several stops may be needed for distributing goods to more than one place; (2) multiple links may be available for the truck driver from or to the final place; and (3) some activities may be conducted before heading back, which may result in different parking locations. All these reasons may cause the exact trip ends to be unfixed, and the time spent in the local zones is hard to predict. Because of that, the longest common segment is trimmed to represent the OD route when studying the trips from a dedicated truck. As illustrated in Figure 1a, the longest common segment for a dedicated truck can be derived when all trip trajectories are lying on the target route.
For a long-haul truck, in addition to the origin and destination stops, a trip may be involved with several intermediate stops for rest, gas, or other purposes. As two trips illustrated in Figure 1b In summary, for a dedicated truck running on a route where the origin and destination are designated, the OD trips can be characterised as (1) a trip chain which may compose N stops, including origin and destination, and N-1 chain links; (2) a reciprocating trip which moves back-and-forth any performance studies. Then, the central tendency (i.e., mean), dispersion (i.e., standard deviation), and position (i.e., 50 and 95 percentiles) of travel times are measured for each dedicated truck. From these measures, the travel time reliability is calculated using different performance metrics such as buffer index, percent variation, and probabilistic measures. To assist the performance comparison of travel time reliability among different trucks on different lengths of routes, a voting score system is adopted so that it integrates different reliability metrics.
With the use of multiday GPS data, this study also provides opportunities for investigating the regularity and variability for long-haul freight truck trips. Other potential applications for splitting truck GPS trips using the method proposed in this paper are, for example, truck travel demand forecasting [25,26], freight fleeting performances [27], fuel consumption and emission estimations [28], roadway bottlenecks identification [22], and so forth. For the logistics and trucking industry, it would be interesting to know the punctuality of a truck on a route to ensure timely delivery and performance stability [29], which will be showcased later in this paper.

DEFINITION OF TRIPS FOR A DEDICATED TRUCK
In this study, a trip is defined as the total movements from a truck's origin to its destination (i.e., OD trip). In GPS data, a trip is simply a set of temporally ordered, timestamped data points that carry geospatial, operational, and other information. The origin can be regarded as the place where a truck stops most often as well as takes the longest rest. The destination is the place where a truck serves its customer (e.g., delivers goods). Thus, each trip it, driver information, and traffic conditions), and this information could not be obtained from other sources.
Although most of the trucks take 20 seconds and 30 seconds as the main intervals, the data recording interval varies among trucks. Within one truck, the data interval may also change, possibly due to errors or instabilities of GPS signal reception. For example, truck number 90 has a 20-second data interval taking 89.3 percent of its total data (the highest frequency) and a 120-second interval taking 7.4 percent (the second-highest frequency). For the truck number 54, the most frequent data interval is 30 seconds, which takes 86.1 percent of its total data, and the second most frequent data interval is 180 seconds, which takes 9.3 percent. In the dataset, the majority of the relatively long intervals, e.g. 120 seconds and 180 seconds, respectively, for these two trucks occur during stops (i.e., when speed equals zero). It should be noted that, although different data intervals exist in the dataset, they barely impact the study when compared with the length of truck trips in hours.

Determining the target route
In this paper, we focused on the target routes, on which the truck transports goods back and forth between two locations on a regular basis. The specific geospatial information of the target routes is estimated from the dataset using a trajectory similarity measure. Interested readers can refer to the previous study [29]. As an example, Figures 2a and 2b plot a truck route on a geo-coordinate plane. Colours represent the speed. In total, 15 out of the 100 trucks whose target routes are significant are selected from the dataset.
repeatedly. It may be hypothesised that the expected on-road travel time would be very close among trips of each truck running on the same route. Since variations would still exist given that traffic conditions may be different for each trip, this study calculates reliability in travel time to indicate the performance of a dedicated truck in this paper.

DATA PREPARATION
The GPS truck data is from the Chinese road freight monitoring and service platform. The platform was established in 2014 and is the only national-level monitoring platform for commercial trucks (e.g. heavy trucks and semi-trailer tractors over 12 tons). All trucks registered on the platform are equipped with a GPS before entering service. The GPS is out of reach of the driver and records data on a basis of a regular time interval regardless of whether it is moving, idle, or parked. The truck transmits a GPS signal with a cellular connection on a platform request basis.

Overview of the GPS dataset
The GPS dataset used in this paper includes 100 trucks collected on from 1 April to 30 April 2016 that followed their own routes (different trucks run different routes) across China. Each row of the GPS log, representing a data point, includes a uniquely identified number (has been scrambled for anonymity), spatial (i.e., latitude and longitude), temporal (i.e., date and time), instantaneous speed, mileage, and head angle information. It should also be noted that the platform does not contain other useful information (e.g. speed lim-  duration of a stop. According to the functionality, a stop can be divided into either an OD stop (stop between two trips) or an intermediate stop (stop within a trip). This will be described below.

OD stops
As a truck runs in the local area of the origin or destination, its route profile is characterised by relatively low speed, frequent speed changes, frequent turning movements, frequent and longer stop times, and so forth. As an example, in Figure 3a, round-trip data is mapped between a local area in Nanyang and a local area in Binzhou, China. For the purpose of illustration, the truck speed is roughly indicated by three colours: green is the speed greater than 60 km/h and red is the speed less than 20 km/h. Intuitively, one can see that the OD stops, as defined in this paper, occur at the two ends of the route.
Technically, the amount of time for a stop (i.e., scale) and the number of stops (i.e., frequency) at a particular place are measured to determine the OD stops. As an example, Figure 3b shows all the stops

METHODOLOGY
The core of this study is to establish a systematic approach to extract purified trips of a dedicated truck from the raw GPS dataset. The methodology includes different types of stops identified in a trip and a generic step-by-step procedure to determine an OD trip.

Identify the trip stops
A stop can be considered a trip end (i.e., arriving at the destination) or a trip interruption for certain purposes such as gas-filling or dining. In those cases, turning off the engine is necessary to define the truck stop. Therefore, a temporary pause caused by traffic control (e.g. traffic signal) is not considered as a stop; rather, it is defined as a part of the trip. In the GPS dataset of this study, a stop can be identified by a speed less than 5 km/h in any length of dwell time. Though engine shutdown cannot be flagged in the dataset as the GPS device records data continuously, it can be reflected by the purpose and the Intermediate stops There may be a lot of reasons for a driver to take an intermediate stop during a trip and these may be grouped into an intended stop and an unintended stop. Intended stops include dining, resting, or overnight sleeping, which can be planned ahead of time. Unintended stops, such as filling gas and waiting at the toll gate, are those which the driver has little to no control over. Figure 4 shows two route sub-sections from the route in Figure 3. As examples, the stops are identified and mapped to the function of the venue manually. Each arrow matches a data point and shows the travel direction. Stops, where speeds are continuously lower than 5 km/h for more than 1 minute, are circled in Figure 4 and their time dwells are measured. In Figure 4a, the stop at a service area in the trip of the northeast direction took 15.2 minutes. In Figure 4b, the stops at a toll station in the southwest and northeast directions took 2.9 minutes and 3.6 minutes, respectively.
As mentioned before, a truck's temporary pause such as waiting for the red light at signalised intersections or being stuck in a traffic jam is treated as a part of the trip and is thus not included in the intermediate stops. This is because the temporary pause (1) represents the running traffic conditions along the route and will be interpreted together in the travel time; (2) is uncontrollable, hard to measure accurately given limited information; and (3) rarely occurs on high-speed corridors of the study segments (e.g. intercity expressway or regional highway). This outlines differences in the intermediate stop definition as compared to the literature [19,20].
that occurred in the trips over the data collection month. A circle is plotted on this figure to indicate the latitude and longitude of a stop. The size of the circle indicates the scale of the time length, and the colour of the circle indicates the frequency of a stop. Trucks with the same destination do not necessarily stop at exactly the same location. However, they may park in close proximity [30]. Therefore, the hierarchical cluster analysis, recommended by [11], was adopted with a cluster of stop frequency in the range of 1 km. As may be seen in Figure 3b, the larger and redder circles are OD stop candidates and are classified into two groups. The largest and reddest circle in each group is identified as the OD stop.
In addition to the amount of time, this study also examined the density of data points (i.e., GPS pins) considering varying data intervals. As the results show in Figure 3c, the count of data points in each stop is located on a longitude-latitude plane. Results of this particular example show that the OD stops obtained using data points are geospatially the same as using the amount of time; however, it is more prominent to determine the OD stops from all those stop candidates. The two methods cluster the density of the stops at a location using different perspectives of data. In this study, both methods are used to inspect the OD stop because there are varying data intervals and missing data in the dataset, albeit in small quantities. It is worth noting that the determination of the OD stops is the core of the trip split method proposed, which does not rely on fixed duration thresholds but an algorithm of the density of the GPS pins or stop durations. Step 1. Preliminary error filtering GPS data may be contaminated by errors involving malfunctioning or poor performance (e.g. signal blockage or signal loss) of the sensor device. The main errors with this dataset include: (1) Duplications. In the raw data, two or more consecutive records may have the same timestamp. It can be fixed by keeping the first record of the same data while removing the rest of the duplicate data. (2) Outliers. In the raw data, incorrect values may be evident. For example, it is unreasonable that speed is greater than 150 km/h and that the distance between two consecutive data points is greater than 1.25 km, given a 30-second data interval. In this case, the outlier is replaced with the average before and after the outlier once it is recognised. (3) Missing data. This refers to the timestamp interval between two consecutive data points longer than 180 seconds, which is found as the longest interval in the dataset. Manual remediation is required only when missing data occur at the beginning or end of a "chain" in the trip, which may consider mileage, time, speed, and angle of ten data points before and after the missing gap.
A manual check of a larger sample data was conducted to locate the latitudes and longitudes of the stops on Google Maps to explore the purpose of a stop. It was found that the average stop time is 20.9 minutes (sample size=122) in the service area and 3.3 minutes (sample size=186) at the toll stations. Both are significantly longer (p<0.001) than a temporary pause at signalised intersections, which takes an average of 0.6 minutes (sample size = 683). Although the intermediate stops are separated as intended and unintended stops, this study does not distinguish them in travel time estimates in the subsequent study. It should be noted that errors may still exist as the method cannot completely separate long pauses from intermediate stops.

OD trip determination
Intermediate stops break a complete OD trip into a trip chain. The following 5-step procedure is used to purify and convert the raw GPS data into OD segments and eventually OD trips. Results of each step from example truck data are shown in Figure 5.

Figure 5 -An example of trip purification and study segment determination results
marked in the dataset only for those data points within the cut-off points, indicated by the "bandpass" filter of the study segment in Figure 5e. The 5-step procedure is programmed in R statistic software so that trips from all the 15 trucks with the 15 target routes are run. The trips determined using the above procedure will serve as the database for the truck performance study next.

TRUCK TRAVEL TIME RELIABILITY
The uniqueness of this study is that travel time is based on the performance of individual trucks over multiple days. This may be in contrast to the segment-based and 15-minute aggregates travel time in the Highway Capacity Manual or other reports in the literature [32]. Therefore, travel time reliability in this paper is defined as the consistency or dependability in travel times, as measured from a vehicle's repeatedly running on the same route over a period of time. The results of the travel time for each truck on each route direction are shown in Table 1. Note that the OD trip time refers to the time truck used to complete the OD trip, which includes time for the intermediate stops. Travel time is calculated by subtracting the total stop time from the OD trip time. As expected in the data, the OD trip time increases as the travel time increases.
It should be noted that the expected running time is used as a time benchmark. This metric is obtained by plugging the latitudes and longitudes of a route into Google Maps. Basically, the running time is estimated under free-flow conditions using the speed limits along the route. This amount of running time is almost the same for both directions and does not change over time when checking China's routes offline.
It was found in Table 1 that some trucks, with a p-value<0.05, have significant differences in the average of the total travel time between the two trip directions. Thus, the travel times of the two directions were was examined separately.
In comparison to the estimated running time obtained from Google Maps on each route (see Table 1), the travel time for some trucks calculated in this study is dramatically longer. Figure 6

. Least Median Square (LMS) smoothing
In some cases, although some data points are suspicious, they might still be correct, leaving a margin of uncertainty. Therefore, this step further filters and smooths the data, as a complement to Step 1. For example, a sequence of consecutive speeds is (0, 0, 0, 0, 0, 0, 100, 0, 0, 0, 0) km/h, in a perspective of data smoothing, the 100 km/h speed is likely an outlier and thus will be replaced. LMS is found suitable for fine filtering and smoothing in this step. It can extract the real trajectories from time series with possible random variations and data shifts [31]. Specifically, in this study, a robust LMS filter package in R statistic software is applied. The width of the moving window filter is set to 180 seconds for the same reason as above.

Step 3. Trip discretisation
In this step, the data is further modified using signal concepts specifically from electrical engineering. At this point, the time-sequenced GPS data can be considered as an analogue signal. The continuous analogue signals are converted into discontinuous digital signals. The benefit of doing so is to separate short stop-and-go movements and amplify signal (i.e., speed) strength at the beginning and end of each chain of the trip. This step produces a trip chain in Figure 5c. The dataset is disconnected among the chains, and the amplitude reflects the median of each chain.

Step 4. OD trip assembling
The idea of this step is to flag the OD stops in the dataset using latitude and longitude as discussed previously. Correspondingly, the time frames of the OD stops are flagged, and the chains between the end of one stop and the beginning of the next stop consist of an OD trip. In Figure 5d, this step constructs "band-pass" filters to delimit each OD trip from the dataset.

Step 5. Study segment determination
The local OD zones are excluded from the OD trips in this step to determine the study segment, which represents the longest common segment where all trips overlap. The cut-off points of the study segment are decided: (1) after the truck exits a high-speed roadway, (2) before entering the local roads, and (3) as close to the OD zones as possible. The geospatial trajectories of all trips and the speedtime profile for each trip are used to determine the cut-off points from the OD trip. Trip numbers are  stead of the average travel time in the calculation as it was found to be beneficial [34]. Therefore, BI is calculated by the 95th travel time minus the median travel time and divided by the median travel time.
The smaller the BI value, the more reliable the travel time. Percent Variation (PV). PV is essentially the normalised standard deviation [3,35]. It is useful when comparing the degree of variation among different routes. The PV is calculated as the ratio of the standard deviation to the mean of travel time and is expressed as a percentage. PV has the same characteristics as the coefficient of variation, thus mathematically a good proxy for a number of other common reliability measures [34]. A smaller PV value indicates a relatively less dispersed distribution and a more reliable travel time.
Probabilistic Measures (PM). PM calculates the probability that the observed travel time is greater than a specified time threshold. This metric pays more attention to the longer travel times (i.e., the right end of the distribution). The probability of travel time is 20 percent greater than the median travel time (i.e., 1.2·50 th percentile) is recommended [36]. The smaller the PM value (close to zero), the more reliable the travel time.
In short, the above three measures represent three categories of indexes to quantify the travel time reliability in literature [37]: the buffer time index (i.e., BI), the statistical index (i.e., PV), and the probability index (i.e., PM). Specifically, BI accounts for unexpected delays and uses the 95th percentile to represent a near-worst case of travel rectangles. The size of each rectangle is determined by the standard deviation of the average travel time that is located at the centre of the rectangle.
As can be seen in Figure 6, generally the longer the study segment length, the greater the difference (i.e., the pink or blue rectangles are far apart from black squares). This may indicate that in the actual truck operation a lot of factors increase the travel time, and the expected time provided by offline Google Maps is not feasible, especially for long-distance trucks.
The truck travel time on the same route may vary from trip to trip due to several factors, such as traffic environment fluctuations and truck operation inconsistencies. For example, temporary traffic events (e.g. accidents, harsh weather, work zone) that cause congestions may dramatically elongate the expected travel time. It should be acknowledged that the study may underestimate the amount of lost time from the intermediate stops. This is because the lost time is measured when speed is below 5 km/h. It did not consider the transition delays when the truck is preparing to stop, i.e., deceleration delay or returning from the stop, i.e., acceleration delay.

Reliability performance metrics
To measure the truck travel time reliability performance, this paper adopts three commonly used metrics as follows.
Buffer Index (BI). BI represents the extra time (or time cushion) that travellers must add to their average travel time when planning trips to ensure on-time arrival [33]. This study used the median in- highest value is ranked 15 as the worst. Discrepancies among different reliability measures may lead to different rankings given the same underlying data. Therefore, this study adopted a voting score system to integrate the three metrics.
As can be seen in Table 2, the travel time reliability rankings using different metrics do not seem consistent. This may not be unexpected as each metric attempts to measure different aspects of the reliability [38,39]. The voting score in the last column of Table 2, which was calculated by adding the rankings of the three metrics (i.e., columns 6 to 8) in both directions for each truck. The overall scores can assist in finding out which truck performs better than time for a specific truck route. PV focuses on the distribution of travel time and compares the variation among different routes. PM focuses on the likelihood of a travel time is occurred. The three travel time reliability metrics have their emphases, which are used to characterise the travel time reliability from different perspectives.

Truck performance results
Using the three metrics selected in this study, the travel time reliability is calculated on the route of each dedicated truck. The values calculated in each metric are ranked among the 15 trucks. The lowest value in each metric is ranked 1 as the best and the 3) An overall performance measure in travel time reliability is applicable to other similar application scenarios in need of punctuality such as tourist couches, intercity shuttles, school buses, and long-distance working commuter cars. In fact, the logistics and trucking industry and dispatch centres would probably find this paper useful.
Admittedly, there are limitations to this paper. Significant factors that affect travel time reliability would be roadway conditions and the traffic environment, which are unknown in the dataset. In over one month of the study period, each trip may experience different situations (e.g. congestions, special events, etc.). Combining all the trip data without knowing that information for each trip may lead to unnecessary extra errors in travel time reliability measures.
A significant limitation of non-intrusive GPS data is the lack of verification of freight facilities along the route. That is also why there is an increasing trend of combining GPS data with GIS techniques to divide the route into segments on which the analysis is performed [17].
Repeated trips on the same route may incur biases in reliability as drivers may adjust their route choice to avoid traffic events. For example, if a major incident creates very significant congestion, trucks may change their routes to avoid the target route. In that case, the trips (out of the target routes) are removed from the data. Consequently, the travel time calculated from the "clean" OD trips may have understated the worst condition on the right tail of the distribution.
It should be noted that the dataset did not track if the trucks had one or more primary drivers. Also, if more than one driver is assigned to operate a truck, it is unknown which driver takes a specific trip in the dataset. Therefore, different truck drivers may have an impact on the results of the truck travel time reliability.
The next step of the study would continue to investigate the applications of the rich and accurate trip-based data source on truck performance measures and other potential studies. For example, the truck OD trip pairs can be used as an alternative to the conventional 4-step process to advance the accuracy of truck travel demand forecasting models. The truck trips can serve as floating data to identify the time and location of recurring congestions. In addition, the trip trajectory would help others regarding travel time reliability, given there might be a need for comparing different trucks on different routes.
In Table 2, truck number 50 has the most reliable travel time performance overall. On the contrary, truck number 67 performs comparatively the worst. It may be found that, in general, trucks on a long-distance route (e.g. truck numbers 52, 15, 87, and 1) have a relatively smaller travel time reliability (i.e., wider dispersion to its mean) compared to the short-distance route (e.g. truck number 54, 57, 91, and 67). This may be because the actual operations of long-haul trucks are more than complicated, which may include unpredictable delays from various aspects.

CONCLUDING REMARKS
Effective use of GPS data offers tremendous opportunities for developing new and relevant measures related to truck performance on highways and overall traffic mobility. Given the situation that massive GPS data are available; however, without further information, it is hard to identify when the trip begins and ends. Instead of using a fixed threshold of the gap time between trips, this paper provided a method including a set of steps to disaggregate the valid trips out of the GPS dataset. This is done by determining the OD stops using the density of GPS data pins or stop durations, error data filtering, curve smoothing, trip discretisation, and assembling. The result of the trip split method was used to the travel time reliability to show the potentials of future application. The contributions are: 1) A systematic procedure with a generic step-bystep approach to purify and extract trip information from large amounts of raw GPS dataset. Compared to the most used methods of split GPS data with fixed dwell time threshold, the density of stop scale and stop frequency are used, which reduces errors of identifying the breaks from repeated and long stops. Meanwhile, this study documents a process to determine the target route, identify the trip ends, and split trip chains for each truck. 2) A unique addition to the existing body of literature on performance measures of dedicated trucks with different route lengths. Trip information purification and extraction can be readily extended to other data sources such as the American Transportation Research Institutes (ATRI) truck GPS database.
training and optimising the behaviour of autonomous trucking in microsimulation, which will also be the future work of the research team.