INVESTIGATING COMMUTING TIME PATTERNS OF RESIDENTS LIVING IN AFFORDABLE HOUSING : A CASE STUDY IN NANJING , CHINA

The phenomenon of affordable housing emerges in Chinese cities to meet low-income residents’ living needs in the city. Because affordable housing projects tend to be located far away from the city centre, their residents tend to face long commuting times to go to work. Although several studies have analysed commuting travel times, none have considered the commuting pattern of residents living in these affordable housing projects. This study employs a decision tree classifier to examine the commuting time patterns of affordable housing residents, fusing the data from the 2010 Nanjing Household Travel Survey and supplementary data collected through Google maps. Results show that attributes of the built environment and distance to work are the factors mostly influencing commuting time patterns of affordable housing residents in Nanjing. The availability of a subway service, job type, household car ownership, job location, travel mode choice, and departure time have logical but varying effects on commuting trip duration. These results provide a better understanding of these residents’ commuting patterns and provide urban planners insights about the effects of their affordable housing policies on travel behaviour.


INTRODUCTION
Affordable housing, a housing concept subsidized by the government, emerges in several Chinese cities to meet the explosive dwelling needs of urban residents during the on-going process of urbanization, alleviating the housing inequality triggered by market-oriented housing reform. The McKinsey's report [1] reported that approximately 13.4 million affordable housing units were built across China between 2012 and 2014. In 2016, Nanjing completed the construction of affordable housing projects providing 39,153 affordable housing units to accommodate the growing number of low-to-medium income households who cannot afford the spiralling housing prices at the overheating commercial housing market [2]. Local governments are partly responsible for these housing constructions and prefer developing the low-cost land at the urban fringe because of serious fiscal burdens [3]. Consequently, many affordable housing projects in China are located on the outer boundary of the cities such as Beijing, Nanjing, Shanghai, and Wuhan [4,5].
Although the provision of affordable housing has solved the dwelling problem for these residents, it also created new problems. At the time of housing relocation, the industries around the housing sites lacked closely associated with residents' commuting times [11,12,[14][15][16][17][18]. Increased commuting time in the US context mainly resulted from the spatial mismatch between job location and affordable housing locations. However, socio-demographic characteristics, housing reform, and transport mode have influence on commuting times. Given work schedule and commuting distance, trip duration is largely decided by commuters' mode choice. Commuting by private vehicles provides people more flexibility in adjusting travel speed, departure time and routes when the circumstances allow, while commuting by public transit makes people captive to its specific schedules and predetermined routes and stops. This contention is confirmed in studies by Kwan and Kotsev [16] and Vincent-Geslin and Ravalet [19], who found that transit users spent more time on their trips than car users.
Studies taking a gender perspective found that lifecycle phase, household structure, number of workers, gender attachment to labour force markets, presence of young children and gender roles are important determinants of travel time [11,13,15]. The compound responsibilities of females between household and employment and gender roles lead to gender differences in commuting time patterns. Scheiner and Holz-Rau [20] and Kwan and Kotsev [16] found that the female's mobility features less car use and shorter work trips, and longer commuting time due to their relative dependence on public transport modes. In contrast, Gimenez-Nadal and Molina [17] concluded that the disproportional household burden, such as household chores and childcare, results in shorter commuting time of females because they are more reluctant to accept job that involves longer commuting. Investigating gender roles, Fan [18] found that gender disparity in commuting times still exists within couples and single-parent households with children.
Other socio-demographic factors such as age, household income, work status, driving license possession, car availability, and level of education are also associated with long duration commuting [19,21,22]. For instance, Selima [22] in a case study conducted in the Atlanta area found that high-income workers have longer mean commuting times, while Black workers who are still heavily concentrated in city centre had shorter commuting times. Dargay and Ommeren [23] found that the effect of increasing income on commuting time is ambiguous, mainly conditioned by commuters' attitudes towards time savings.
Controlling for socio-demographics, Schwanen et al. [15] investigated the relationship between urban form and commuting time and found that travel times tend to be longer in decentralized regions. Susilo and Maat [13] examined the influence of urban form and transport accessibility on commuting times in the Netherlands and found that urbanization and increased development, most development staying dispersedly. Therefore, the relocation of housing to suburbs resulted in decreased job accessibility and increased commuting times [6][7][8]. Longer commuting distances generated a higher potential demand for car ownership [9]. However, constrained by their socio-demographic status, most residents living in affordable housing cannot afford the high cost of car purchase and usage. They depend on available public transport in their neighbourhood or on slow transportation modes. Moreover, the lack of high-quality transit services implies that these economically disadvantaged residents face miserable commuting trips. As Morris and Guerra [10] concluded, lengthy commuting not only impacts commuters' travel moods but also negatively influences human well-being. Therefore, analysing the factors influencing the commuting pattern of residents living in affordable housing is of great relevance.
A range of factors including socio-demographics, land use pattern, transportation system characteristics and job accessibility are claimed to be associated with commuting time [6,[11][12][13]. Several studies have examined commuting times for people with different profiles, but to the best of our knowledge, no studies have ever explicitly investigated the commuting time patterns of these economically disadvantaged residents. Therefore, this study analyses one-way commuting time of these residents, addressing the following questions: (1) what factors influence these residents' commuting times considering socio-demographics, and features of the built environment; (2) how do these factors affect commuting time patterns in different contexts. In answering these questions, we use the affordable housing travel survey conducted in Nanjing in 2010, which is the latest available data. Although this data set is relatively old, the fundamental relationships in the data change unlikely fast. Applying Decision Tree Classifiers, we assess the influence of socio-demographics and characteristics of the built environment and capture the commuting time patterns of 387 affordable housing residents in different contexts.
The remainder of this paper is organized as follows. Section 2 presents literature review of factors affecting commuting times. In Section 3, the data sources and study context are described. Section 4 presents the proposed methodology, followed by a discussion of the results and policy implications. The paper is completed with conclusions and a discussion of possible avenues of future studies.

LITERATURE REVIEW
Several studies have concluded that urban form, job accessibility, socio-demographics, and job-housing related policies (e.g. market-oriented housing reform, management of labour force and labour market) are covered three affordable housing communities: Hengsheng Homeland, Yadong Town, and Chujiang Newtown. In the survey, the respondents were required to provide their personal and household characteristics as well as their trip details. After removing cases with missing data or unreasonable travel times, the final sample consists of 387 commuting trip diaries from 99 residents in Hengsheng Homeland, 138 in Yadong Town, and 150 in Chujiang Newtown. Our study focused on commuting trips before 10 a.m. as this is the time frame within which most commuting trips take place.
Using Google maps, we measured the job-home distances and attributes of the built environment including proximity to CBD, job-housing co-location, job location, and public transport accessibility. To measure job-housing distance, the shortest driving distance according to Google maps was used. The proximity to the CBD that measures the centrality of the home location tends to be associated with land use patterns in the residential neighbourhood in the sense that a closer distance to the CBD tends to be associated with a more diverse and dense land use pattern. To get an idea of the traffic around the workplaces during commuting hours, job sites were geographically split into the centre, middle, and suburbs. Job-housing co-location, an indicator of job-housing balance at the district scale, was measured by indicating whether workplaces are located in the same district with the residences. Public transport accessibility was measured by bus service accessibility and subway availability. Bus service accessibility is quantified by the availability of bus lines that directly connect workplaces and home within a 600 m buffer, while subway accessibility was measured in terms of the accessibility of subway stations within 1 km. Data obtained from Google maps may not perfectly represent the situation in 2010, but this information is the best available data for Nanjing. Figure 1 illustrates the three typical affordable housing communities: Hengsheng Homeland, Yadong Town, and Chunjiang Newtown. Table 1 provides a brief summary of the geographical locations and transport mode share of the surveyed communities. -Hengsheng Homeland is located north of the CBD and has the smallest built-up area (140,000 m 2 ) with the closest distance (8.7 km) to the city centre. The community was developed quite early between 1999 and 2005 when the urban boundary was not as large as now. The mode shares show that public transit (42.4%) and active transport (54.6%) are the dominant transport modes, which may be ascribed to its advantaged geographic location. -Yadong Town is a medium-sized area (500,000 m 2 ) located in the northeast of Nanjing. Despite the longest distance (17.6 km) to the city centre, its neighbouring area has a subway service. 57.2% of transport-network density and accessibility reduced commuting distances but increased commuting times. Increasing congestion may be the explanation for this finding. Bento et al. [24] and Frank et al. [25] argued that commuting time is the result of a household residential locational decision that depends on employment and relative salary opportunities, housing price, community amenities, school quality, urban structure and traffic patterns, and transport mode choices. Zhao et al. [26] found that job-housing balance at the individual level, measured by the co-location of jobs and houses in the same sub-district, has a statistically significant and positive effect on the reduction of commuting time. Using data from 164 Chinese cities, Sun et al. [27] quantified the impact of urban spatial structure on travel time at the city level and found that the average commuting times are positively associated with the city size and job housing separation, and negatively correlated with average population density and employment polycentricism.
Methodologically, most studies have applied statistical methods such as multiple regressions analysis, providing general insights into factors affecting trip duration such as gender, urban form, transport accessibility, and job-housing distance [13,18,26,28]. However, trip duration may also depend on the context (e.g., the change in commuting distance or the departure time). Apart from the fact that the dependent variable (time) does not always satisfy the assumptions of the chosen statistical analysis, a more important drawback of these analyses is that the chosen linear additive structure of the statistical model does not capture the typical, complex interdependencies of the explanatory variables on commuting time.
Therefore, a decision tree classifier is applied to represent the complex non-linear relationships between these factors and commuting time patterns. Our contribution to the literature can be summarized as follows. First, the special features of affordable housing in China in terms of its specific objectives (low-income) and unique residential contexts differentiate this study from earlier work conducted around the world. Second, insight on how commuting time patterns vary within job housing distances will be given and this mechanism will be graphically interpreted by the constructed decision tree. Third, this study adds to the scarce literature on this topic in the Chinese context.

DATA
To capture commuting patterns of residents who live in affordable housing, the 2010 Nanjing Affordable Housing Household Travel Survey was fused with attributes of the built environment, extracted from the Google maps. The survey conducted by Nanjing local government and sponsored by the World Bank has longer time commuting than those without (31.94 min vs 27.08 min). However, consistent with the view that households with preschool children need to spend more time on childcare, households with children younger than 6 tend to spend less time on commuting (27.82 min). Workers or service attendants, representing 51.8% of the sample, commute longer (31.59 min) than others (29.04 min). Small differences are shown among different income groups. Consistent with a previous study conducted by He and Zhao [21], educated residents commute about three minutes longer than less educated residents.
In the sample, 56.8% of the respondents have jobs in the same district as their residence and commute nearly 17.5 minutes less than those who do not work in the same area they live in. Residents having jobs in the city centre spend more time commuting (34.82 min), which may be caused by the extra distance to the CBD and/or to the congested road network during morning peak hours. The mean commuting time of affordable housing residents having access to a direct bus service connecting home and work (27.72 min) or a subway service (24.43 min) indicates that both the provisions of a direct bus line or subway service in the neighbourhood helps in reducing the commuting time. 54.0% of the respondents commute by active transport, 33.3% use public transport, while the remaining 12.7% use motorised modes (private car or motorcycle). Public transit commuters (bus users and subway riders) take the longest time (42.89 min) to go to work. Figure 2a displays a positive relationship between commuting time and commuting distance. To gain further insight, the distance was aggregated into three categories (<2.5 km, 2.5-7.5 km, and ≥7.5 km) using the equal frequency discretization method [38] and additionally accounting for the service ranges of three main commuting modes. One third of the respondents commute more than 7.5 km; 35.9% between 2.5 and 7.5 km, and 30.0% less than 2.5 km. Figure 2b shows that the majority of residents depart between 7:00 a.m. and 8:00 a.m. and their commuting time has high variability. The medium commuting time '30 min' shown in Figure 2 is used as the standard for discretizing the commuting time intervals (< 30 min and ≥ 30 min) in the following part. its residents commute by active transport modes, 17.4% commute by public transport (11.6% by bus and 5.8% by subway), while 25.4% choose private motorised modes (car and motorcycles) for the work commute.
-Chunjiang Newtown covers the largest built-up area (722,000 m 2 ) and is located in the south of Nanjing. It mainly serves low-income households whose residences were demolished during urban renewal. It has a medium distance to the city centre (10.2 km). As to mode share, 48.7% of the respondents use active transport modes, 42% take the bus, while 9.3% use private motorised mode. Table 2 summarizes the sample characteristics. The average time men spend on commuting is 2.5 minutes more than women do. The gender difference in commuting time is not very large, which may be partly attributed to the high ratio of dual-earner households. Their economic status demands more women to work. Also, it may reflect the diminishing trend of gender differences in the labour market in large cities such as Nanjing. Nevertheless, this finding differs from the gendered nature of work commutes found elsewhere in the world. Across all age groups, commuters aged over 39 commute longer (31.93 min). Residents who have access to cars (owning cars or driving licenses) commute 2-7 minutes less than others, while residents possessing IC cards (public transit cards) spend  commuting time intervals as the dependent variable was applied to the data. Decision tree classifiers have been employed in some areas of transportation research, such as road safety analysis and travel behaviour analysis [29][30][31][32][33]. These classifiers have several powerful features. First, the tree-like structure allows an intuitive interpretation of the complex, non-linear 4. METHODOLOGY

Research design
To further understand the commuting patterns of residents living in affordable housing in Nanjing, particularly how the selected factors influence their commuting time, a decision tree classifier with binary

Figure 3 -Information gain of attributes
To learn and validate the decision tree, we randomly split the cases into two approximately equally-sized sets. There were 193 instances used as the training set for decision tree structure learning and the remaining 194 instances were used for validation. Table 3 lists the overall estimation results for both the training and test datasets. Table 4 shows the confusion matrix classified by commuting time intervals, while Table 5 details the evaluation results for both the training and test dataset. The overall estimation accuracies of the decision tree, shown in Table 3, are 87.56% and 85.57%. The small difference of less than 2% indicates that the trained tree structure is valid. As the target variable 'commuting time' is binary, various evaluation criteria such as true positive rate (TP rate), false positive rate (FP rate), precision, F-measure and receiver operating characteristic (ROC) curve can be applied to judge the quality of the solution [36,37]. These performance measurements are commonly used for classification tests and performance results for both training and test datasets are presented in Table 5. TP rate measures the ratio of instances that are correctly predicted. It further means it encompasses the cases where the commuting time interval is predicted consistent with the observation. The exact value of the FP indicators is calculated as the ratio of the correct prediction to the total sample. The FP rate, on the other hand, is the proportion of instances that are incorrectly predicted. TP rates are 0.798 (< 30 min) and 0.910 (≥ 30 min) for the test data, which indicates a high classification power of the tree model in both situations. The low FP rate also implies a small probability of incorrect classification. The precision values calculated by the equation TP/(TP+FP) in both commuting time intervals are 89.3% and 82.7%, with no significant difference, which suggests that the induced classifier predicts well in both commuting time intervals. relationships among the independent and dependent variables. The induced structure illustrates how the attributes influence residents' commuting times. Second, the decision tree classifiers can be used to deal with inter-correlated variables no matter what the categories of the variables are (discrete and continuous variables). In view of these advantages, we decided to apply this technique in this study.
A typical decision tree is composed of two types of nodes and several branches. The nodes by which the tree splits are condition nodes. More precisely, the condition nodes represent the explanatory attributes that split the data in different classifications. The branches represent features that split the data. The condition root on top of the tree is called "root" which is highly correlated with the final node at the end of each branch is decision node. Decision node specifies the decision / action that is taken if the conditions across the corresponding branch hold. In our study, three types of variables were used for tree structure learning (as condition nodes): socio-demographics, commuting trip characteristics, and features of the built environment (see Table 2).
To find the optimal set of contributors to commuting time, the learning starts with attribute selection, where methods such as forward/backward stepwise selection are commonly used [34]. However, its one-by-one stepwise feature is sometimes to blame for their disability to find the optimal but local optimal set of predictors. To reduce the odds of this mistake, we employed the information gain measure, whose performance has been proven in previous studies [33], to select the best set of variables. The information gain of data set D with the split attribute A is measured as [35]: , where v is one possible value of variable A, Value (A) is the full set composed by all possible v, and D v is the subset including all instances with A=v. Compared to the original entropy of data set D, the reduced entropy is the information gained in this partition. It indicates the importance of attribute A in dataset D. We applied the J48 algorithm for structure learning. For more details, readers are referred to Quinlan [35].

Model results
Following the learning procedure, we calculated the information gain for each attribute, which are shown in  Figure 5 shows the resulting structure of the extracted decision tree. There are 20 nodes and 12 leaves in the tree. For each 'commuting time interval', two numbers representing respectively the number of training cases and the predicted error are shown in the brackets. The variables 'ODDIST' (job-housing distance), 'SUBWAY' (the availability of subway service in the neighbourhood), 'CAR' (household car ownership), which appear in top two levels of the tree, are found to be the three most important predictors that split the sample. Next, job type and commuting mode choice play a role in the subsets. The job status decides the on-time requirement and once the trip distance is given, the travel mode choice largely determines how much time is spent on the trip. Further, job location and departure time, both of which mainly relate to traffic on the road, traffic rules or regulations on different transport modes, and availability of public transit facilities, contribute to the classification into homogeneous subsets.
These branches can also be viewed as different rules that depict differences in commuting time patterns of affordable housing residents. Occasion 1: if the job-housing distance of affordable housing resident is within 2.5 km, a decisive effect of job-housing distance is presented that 48 out of 52 commuters had their commuting time of less than 30 minutes.
The ROC curve is another important validation measurement that is more comprehensive and frequently used for classification evaluation. Figure 4 presents the overall ROC curves obtained. Both curves are located above the diagonal line with an under area of 0.910. The area under the curve (AUC) and the F-Measure are important performance indicators that range from 0 to 1: the closer their value approximates 1, the better the performance of the decision tree model. As shown in Table 5, the values of the ROC area and the F-measure (0.910 and 0.855) are above 0.8, indicating that the model has good predictive power.     [39] that the travel pattern of low-income commuters is more shapeable and is more sensitive to the provision of economically effective travel modes (e.g. buses and subway) in their neighbourhoods. A similar job effect is found in the structure in Figure 5.

Policy implications
These results indicate that job-housing distance, subway service availability, and job location are found to have detrimental influence on their commuting time pattern particularly in medium-to-long distance commuting. This role of transit and urban design in commuting pattern is critically important for these disadvantaged social residents living in affordable housing. Working in the city centre implies high probability of encountering traffic jams or tidal traffic flows during commuting. Wang and Zhou [42] stated that the progressive planning that provides effective public transit modes could moderate the spatial job-housing separation caused by the disadvantaged housing location, and the meanwhile developments in suburbs would meet the needs of job seekers and reduce the possibility of long-distance commuting to other regions. In this case, the availability of bus lines that directly connect to job sites is not significantly associated with commuting time, partly due to its low reliability in the time schedule.
Occasion 2: the job-housing distance ranges from 2.5 to 7.5 km and there is no subway service available, 42 out of 52 residents commute more than half an hour. But when subway services are available, the occasion of trip duration on 2.5~7.5 km commuting will become more complicated but logically interesting. Then, job type makes effects. Within such context, affordable housing residents of the 'workers' status are more likely to take less than 30 minutes for the 2.5~7.5 km commuting trip. For residents who are not workers, their commuting time will be somewhat different. After that, the working site location takes effects. It is plausible. As working sites usually associate with on-road traffic status and land use pattern, the additional information on working site location to the given job-housing distance range and home site location could help foresee the traffic on road and the occurrences of tidal traffic flows that commuters could meet. Then, the commuting trip duration would be more identifiable. As shown in the figure, three fourths of affordable housing residents working in the centre went on a trip of more than 30 minutes. However, to residents who have jobs in the suburbs, one extra split takes place on the departure time in order to reach relatively homogenous response to the commuting time. Occasion 3: Faced with long distance commuting, economically disadvantaged residents usually have to suffer. The tree structure shows that with job-housing distances beyond 7.5 km, there are 49 out of 52 residents who cannot afford a car and have to commute for more than 30 minutes. Against our expectation, the car ownership among affordable housing residents does not show a significant superiority in commuting time re-  Three conclusions may be drawn from the results of our analysis. First, the job-housing distance within specific range, here less than 2.5 km, is to have decisive effects on the commuting time, while socio-demographics and built environment attributes for people who have this very short commuting distance, hardly contribute to the discrimination of residents' trip duration. Second, for residents whose commuting distance ranges from 2.5 to 7.5 km, the availability of subway services and job location starts to make great effects influencing affordable housing residents' time used in commuting. Third, with the increasing job-housing distance more than 7.5 km, these socio-demographic variables such as car ownership, and job status start to play great roles in their commuting time. These findings would suggest that the influence of different context may vary from one occasion to another.
Although this research has provided interesting insights in commuting time patterns, some limitations remain to be addressed in the future research. First, unfortunately the sample size was rather small, implying that some features of the data may not be detected by the decision tree. Moreover, the small sample size forced us to ignore some features in the modelling process. Third, due to the lack of data, additional variables such as employment type for these affordable housing communities on a specific scale could not be incorporated in the analysis. Finally, with time moving along, in the future, the longitudinal effect of the changing environment on commuting time could be analysed.
It goes without saying that the findings of this study are confined to the study area. One may argue that the location of these affordable housing communities was not too extreme. In fact, many residents in most countries in the world face similar or longer commuting. In that sense, it would be of interest to replicate this study in cities such as Beijing and Shanghai, where rapidly increasing housing prices have forced low-income people to find residences much further out of the city centre.  Contrary to previous studies, socio-demographics except car ownership and job types are not statistically associated with the commuting time. The negligible effects of gender and income on commuting time may be attributed to the faint economic difference of this group, most of which are low-paid dual-earner households. The segmentation of commuting distance has further diffused the small differences between different gender and income groups in data description. Car users here are shown to be captives of their outlying living environment. Therefore, their commuting behaviour can possibly be reshaped by high-quality public transit service. Transit policies can be flexibly designed in these affordable housing communities, such as providing demand-responsive shuttle buses connecting home and job locations and arranging bus routes and operation schedules as needed. Similarly, differences in commuting times between people in other socio-characteristics are not reflected in the tree structure, which also proves the inability of statistical descriptions in explaining the intrinsic mechanism or giving root solutions for long-time commuting.
Lastly, travel characteristics in terms of mode choice and departure time affect commuting times. Due to the low income status of affordable housing residents, the increasing distances did not statistically increase car commuting as expected. Therefore, a cost-effective transport mode is in great need for this group of people. TOD (Transit Oriented Development) policies improving public transport service or service accessibility with land use development are of great necessity. Meanwhile services such as the bike-sharing mode that serves the first-and last-mile of subway trips could be provided to strengthen and guarantee the efficiency of multimodal transit travel.

CONCLUSION
This study is aimed at investigating commuting times of residents living in affordable housing in Nanjing, China. Commuting time was examined within different job-housing distance categories. The affordable housing, which is the product of government intervention in response to urbanization and rapidly increasing housing prices, tends to be located on the city fringe. As the name indicates, these housing communities serve low-income residents who cannot afford the high living cost in more central parts of the city.
Inspired by findings in the countries other than China which indicated that the overall commuting distance and time of urban residents increased [40,41], led to the question how these residents in affordable housing communities commute and what their commuting time patterns look like. Using trip diaries of 387 affordable housing residents from the 2010 Nanjing Household Travel Survey, a decision tree classifier was used to answer these questions.