APPLICATION OF AN INTELLIGENT FUZZY REGRESSION ALGORITHM IN ROAD FREIGHT TRANSPORTATION MODELLING

Road freight transportation between provinces of a country has an important effect on the traffic flow of intercity transportation networks. Therefore, an accurate estimation of the road freight transportation for provinces of a country is so crucial to improve the rural traffic operation in a largescale management. Accordingly, the focused case study database in this research is the information related to Iran’s provinces in the year 2008. Correlation between road freight transportation with variables such as transport cost and distance, population, average household income and Gross Domestic Product (GDP) of each province is calculated. Results clarify that the population is the most effective factor in the prediction of provinces’ transported freight. Linear Regression Model (LRM) is calibrated based on the population variable, and afterwards Fuzzy Regression Algorithm (FRA) is generated on the basis of LRM. The proposed FRA is an intelligent modified algorithm with an accurate prediction and fitting ability. This methodology can be significantly useful in macro-level planning problems where decreasing prediction error values is one of the most important concerns for decision makers. In addition, Back-Propagation Neural Network (BPNN) is developed to evaluate the prediction capability of the models and to be compared with FRA. According to the final results, the modified FRA estimates road freight transportation values more accurately than BPNN and LRM. Finally, in order to predict the road freight transportation values, the reliability of the calibrated models is analyzed using the information from the year 2009. The results show higher reliability for the proposed modified FRA.


APPLICATION OF AN INTELLIGENT FUZZY REGRESSION ALGORITHM IN ROAD FREIGHT TRANSPORTATION MODELLING
ABSTRACT Road freight transportation between provinces of a country has an important effect on the traffic flow of intercity transportation networks.Therefore, an accurate estimation of the road freight transportation for provinces of a country is so crucial to improve the rural traffic operation in a largescale management.Accordingly, the focused case study database in this research is the information related to Iran's provinces in the year 2008.Correlation between road freight transportation with variables such as transport cost and distance, population, average household income and Gross Domestic Product (GDP) of each province is calculated.Results clarify that the population is the most effective factor in the prediction of provinces' transported freight.Linear Regression Model (LRM) is calibrated based on the population variable, and afterwards Fuzzy Regression Algorithm (FRA) is generated on the basis of LRM.The proposed FRA is an intelligent modified algorithm with an accurate prediction and fitting ability.This methodology can be significantly useful in macro-level planning problems where decreasing prediction error values is one of the most important concerns for decision makers.In addition, Back-Propagation Neural Network (BPNN) is developed to evaluate the prediction capability of the models and to be compared with FRA.According to the final results, the modified FRA estimates road freight transportation values more accurately than BPNN and LRM.Finally, in order to predict the road freight transportation values, the reliability of the calibrated models is analyzed using the information from the year 2009.The results show higher

INTRODUCTION
Transportation planning plays an important role in a macro-level management system in every country, since its economic, social, cultural and political effects are quite obvious.Freight transportation is defined as displacement of various types of commodities by different modes of transportation such as road, railway, air, and marine.In an efficient transportation network, all available modes are expected to operate consistently to decrease the traffic congestion on roadways.However, without using roadway networks, connecting all provinces in a country is very expensive and to some extent impossible.Therefore, the accurate estimation of road freight transportation (RFT) between various provinces of a country is specifically remarkable in macro-level planning.
Dependence on environmental characteristics and weather conditions, variable transportation costs based on the transportation distance, and uncertainty resulting from roadway characteristics, are some of the disadvantages of RFT.However, it does also have some advantages such as flexibility (shipping commodities in different sizes and shapes) and availability (using different routes and delivery scenarios).These advantages cause RFT to be implemented as the main mode for both freight and passenger transportation in several developing countries, like Iran.Iran has one of the historically main connecting routes between Asia and Europe.Due to the natural, tourist and pilgrimage attractions of Iran, road transportation is the easiest and most prevalent way of passengers and freight displacement.
Furthermore, a great number of heavy vehicles transport essential freight for users, factories, industries, mines and agriculture.Iran's road transportation fleet, with an annual capacity of more than 600 million tons of freight and 350 million passengers [1], manage more than 80 percent of the whole transportation.Iran's RFT has risen from 226.4 million tons in 1999 to 511.5 million tons in 2008 that shows a total growth of 126 percent and an annual average growth rate of 9.5 percent [2].In 2008, there were 84 percent of commercial freights in Iran carried along roadways [3].
RFT modelling seems crucial in the future studies regarding prediction and planning.Accurate prediction of future RFT has significant impact on planning to develop transportation facilities.Additionally, macro-level planning techniques such as population control can contribute to the maintenance of the amount of RFT below the capacity of existing road networks.Attracted freight transportation for each province is assumed as the road freight transportation value in this research.This value has also a significant correlation with the produced freight transportation.Correlations between RFT and some basic variables such as average cost and distance of freight transportation, population (POP), area, Number of Cities (NOC), Number of Villages (NOV), average household income, and GDP of the province have been analyzed.The correlations illustrate that RFT is highly dependent on the population of the province.To predict RFT, LRM is calibrated using the POP variable.Subsequently, by applying fuzzy theory, a new modified FRA is generated on the basis of the developed LRM.In addition, BPNN is trained to predict RFT values.Along with the evaluation of models' fitting ability based on the information of the year 2008, their reliability and prediction ability are also investigated for the information in the year 2009.

LITERATURE REVIEW
This section concentrates on the application of fuzzy regression and artificial neural networks in freight transportation modelling.Tortum et al. [4] used artificial neural networks and adaptive neuro-fuzzy inference to model the mode choice of intercity freight transport.The complex non-linear relationships between different variables were analyzed efficiently by combining the learning ability of artificial neural networks and the transparent nature of fuzzy logic.Tsung et al. [5] predicted the volume of Taiwan air cargo exports applying a fuzzy regression model.GDP was used as the independent variable in this research.To enhance the validity and reliability of the fuzzy regression model, they tried to adopt the asymmetric triangular fuzzy method.
Celik [6] used three different types of artificial neural networks to model the inter-regional freight transportation of 48 continental states of the USA based on the 1993 U.S. Commodity Flow Survey Data.Their results showed an improvement in comparison with Celik and Guldmann's [7] Box-Cox regression model.Tianwen [8] compared feed forward and back propagation neural networks to predict railway freight transportation.The results showed higher prediction ability for the feed forward neural network.Bo and Min [9] predicted freight transportation using a radial basis neural network function.They also studied effective variables in prediction of the freight transportation through an AHP model.
Min et al. [10] developed the BPNN model to predict the freight transportation.They evaluated several different national economy characteristics such as "GDP, proportion of the second industry, output of coal, output of steel, throughput goods of port, infrastructure investment, and railway market share".Yong and Xiang [11] also applied the BPNN model to predict the railway freight volume.The developed model was able to identify and simulate the non-linear and complex relationships between the railway freight volume and effective independent variables.Furthermore, Ling and Zhuo [12] analyzed several qualitative and quantitative factors and predicted the railway freight volume using the BPNN model.

Fuzzy Regression Algorithm (FRA)
Since this section is briefly extracted from Tsung et al. [5], Zadeh [13] and Tanaka [14] papers can provide more information about fuzzy set theory, fuzzy sets and triangle fuzzy numbers.
Tanaka [14] introduced the fuzzy linear regression considering the vagueness.The main assumption of this algorithm is that the residuals between estimator and observation are caused by uncertain parameters in the model.Therefore, the parameters in an FRA model are fuzzy numbers.Equation 1 shows the initial form of FRA: 1 t , then hj will be determined using Equations 5 or 6.
First scenario: hj can be derived by using Equation 3(Figure 1): , , j n 1 f = .Therefore, considering Equations 5 and 6 in the first and second scenarios, two limitations are generated as follows: Subject to these two limitations, the fuzzy estimator Yj t with a triangular membership function with the smallest spread is considered to maximize the accuracy.Therefore, "the target function of fuzzy parameter , , A c a b i i i i = ^h from summing up all of the triangular membership functions' spread of the sample estimators would be subject to Equations 7 and 8" [5].Finally, the target function to minimize the total amounts of the spread with n sample estimators is defined as follows:

Scenario index and index of optimism
This section is also extracted briefly from Tsung et al. [5].The cut ais used to determine the unpredictability nature of different scenarios.For explicit scenarios, definite values and clear information, the assumed value for cut ashould be higher.On the contrary, smaller values of cut ashould be considered in an ambiguous environment.Therefore, 0 a = represents the most ambiguous scenario and 1 a = represents the clearest one.
in first scenario [5] Second Scenario: hj can be derived by using Equation 3 (Figure 2): To estimate ci , ai and bi , the value of h will be a threshold.All hj values for observations and estimators should be greater than or equal to h, h h j $ , scenario [5] h j The alpha cut for the triangular fuzzy value , , A c a b The upper and lower boundaries for the fuzzy regression value / / / after applying the alpha cut will be determined as follows: The Index of Optimism m represents the level of optimism under the same external environment.The index of optimism can be used, after determining the upper and lower boundaries of FRA.The value of Y m a t will be considered at the upper boundary, if our assumptions are fairly optimistic and confident.On the other hand, the lower boundary of Y m a t is considered when the assumptions are more pessimistic and less confident.

Modifying fuzzy regression algorithm
This section is added to the routine fuzzy regression algorithm by authors to obtain higher prediction accuracy.The main concept of this modification is finding a way to determine the index of optimism, m values, more consistent to the reality.By increasing the accuracy of m values, the error values will decrease significantly.In this research, a sub-modelling procedure to develop a model to compute m values is suggested.Certainly, the proposed model must be reliable, statistically significant, solvable with available database, and not time consuming.In this way, the modified FRA will try to decrease its error values intelligently by choosing the best possible values for the index of optimism.
The most important part of this process is to find a range for U and L values, , L U a a 6 @, which contains all values of dependent variable.By increasing a values, the range will decrease and consequently the final error prediction will decrease.The first reasonable suggestion for a could be 0.5.
If all values of the dependent variable are located in this range , L U a a 6 @, the most accurate (exact) m value for observations is calculated as follows: where j ml is the modified m.If the exact value of j ml can be estimated without any error, the value of Y m a j t will be equal to RFTj .In other words, the error value for the estimated Y m a j t will be zero.
The next step is to develop an equation to estimate m t values.The estimated m t values will be used as m in FRA modelling.If a sub-model can be developed to predict the values of m t very close to the best possible values of ml, the error prediction of FRA will decrease intensely.The proposed sub-modelling procedure is a simple linear regression model using available dataset and existing variables.

Back-Propagation Neural Network (BPNN)
This section briefly introduces the BPNN model, extracted from [15] and [16].Figure 3 shows a simple structure of BPNN including an input layer, a hidden layer, an output layer and connections between them.[15] Back-propagation learning algorithm consist both forward and backward phases in learning process.This procedure is based on an iterative generalized delta rule with a gradient descent of error.The final goal is to minimize the total error between the actual desired values after modification of connection weights [15].The initial values for connection weights Wji and Wkj , and biases j i and k i must be assumed.In the input layer, the input values netpi are activated on the neurons.Then, training and testing values are prepared.Calculation of the "input values of a hidden layer j, netpj , using the output values of an input layer i, Opi , connection weight Wji , and biases j i between an input layer i and a hidden layer j is the next step.Finally, the output values of the hidden layer j, Opj , are derived from netpj " [15]: Where .
f^h is an activation function.In this study, the sigmoid function is used as an activation function, because it can balance the linear and non-linear behaviours [16]: where a is the slope parameter of the sigmoid function.
P. Najaf, S. Famili: Application of an Intelligent Fuzzy Regression Algorithm in Road Freight Transportation Modelling "Input values of an output layer k, netpk , are computed using the output values of a hidden layer j, Opi , connection weight Wkj , and biases k i between a hidden layer j and an output layer k.Then, the output values of an output layer k, Opk , are derived from netpk " [15]: (18) To modify the connection weights and biases based on the generalized delta rule, the error at output neurons is propagated backward to the hidden layer, and then to input neurons.These steps are from the hidden layer to output layer neurons: This procedure should be repeated until error E goes below the target value.

MODELLING 4.1 Step 1: Correlation
The studied dataset [2,17], from the statistical yearbook of Iran Road Maintenance and Transportation Organization in the year 2008 contains information of all 30 provinces of Iran.The relationship between RFT and other variables such as average cost of freight transportation, average distance of freight transportation, POP, area, NOC, NOV, GDP and average household income of each province is analyzed by computing Pearson Correlation to determine the significance of independent variables.Table 1 shows the correlation between RFT and independent variables.Several references have introduced a range (generally .0 7 ! to 1 ! ) for the correlation coefficient to identify a strong relationship between two variables [18].According to Table 1 RFT of each province has a close relationship with its population which is the only significant independent variable with a correlation of more than 0.7.Therefore, the population will be applied as the main independent variable in the modelling process.Table 2 represents the statistical characteristics of POP and RFT variables.Also, the values of these variables for all provinces are represented in Table 4.

Step 2: LRM
LRM is calibrated by SPSS software [19] to model RFT.Several LRMs have been developed to find the best descriptive linear regression model, considering two main assumptions: 1 -developing the model using significant variables (assuming significance level: 0.01) and 2 -developing the model without any multicollinearity.All insignificant variables were omitted in a step-by-step backward modelling procedure.Finally, the most appropriate statistical characteristics are obtained in the model with POP (as the only independent variable).
In this research, the main goal is not to find the effective factors on RFT prediction.Applying an accurate methodology to develop a macro-level model is the final aim.Indeed, we will just try to develop a more accurate and flexible model to predict large-scale values.Furthermore, it is very difficult to identify several significant variables in an aggregate (not discrete) model such as this research in which the degree of freedom is very low.Therefore, models will be developed using POP as the only significant independent variable.
Table 3 shows the model summary and predicted values of this LRM are presented in After solving this Linear Programming (LP) problem by GAMS software [20], the values of parameters are estimated as follows: Thus, U Y a t and L Y a t for different values of a can be calculated according to Equation 12. Figure 4 shows the predicted values of LRM, the real values of RFT variable, U Y 0 t and L Y 0 t .The situation of 0 a = is generally selected in the case of perfect uncertainty which causes FRA to predict the most extensive boundaries for y t .In fact, the higher values of a lead to the smaller prediction boundaries and the higher certainties of the predicted values.In this study, 0.5 is supposed for a, and the calculations will be done based on .where j ml is the modified m.The values of ml are presented in Table 4.
The next purpose of this step is to find an equation to estimate m t values.This research has tried to find a linear regression model to predict the most accurate values for m by using available information and variables.After generating several linear regression equations between ml and other independent variables, ml had a significant linear relationship with POP, NOC and NOV, as follows: . . .POP N OC 0 2247 0 0328 0 0097 where m t is an estimation of ml, and POP, NOC and NOV are population, number of cities and number of villages, respectively.The values of these variables are also illustrated in Table 4.
The estimated m t values will be used as m in FRA modelling.Now, after calculating Equation 11 using m t , U .0 5 and L .0 5 values, the final predictions of the intelligent modified FRA are obtained as shown in Table 5 and Figure 5.

Step 5: Training BPNN
In this step, BPNN is trained by Neurosolution 5 software [21].Sigmoid function is used as the activation function, and the network is developed with one hidden layer and four neurons in this layer.BPNN reaches the lower value of its mean square error at the 666 th epoch, and the learning rate in the training process is assumed 1.The final values of BPNN prediction are displayed in Table 5 and Figure

FITTING ABILITY
As it is clear in Figure 5, FRA estimates the desired values with higher flexibility in comparison to BPNN and LRM.BPNN cannot be flexible to predict small noises in RFT values.Indeed, BPNN predicts a smooth curved line while FRA predicts variations well.This is the main superiority of FRA against BPNN which is so effective in large-scale planning and predictions.Fig-ure 6 shows the predicted values of FRA and BPNN separately, where FRA has higher fitting ability than BPNN in estimation of observations.
In addition, the fitting ability of the generated models is studied through comparing their prediction error values.Errors are computed by the following equations:   6 compares models' fitting ability by indicating their fitting error values.
According to Table 6, FRA has the most fitting ability since its error values are smaller than those of BPNN and LRM.Also, BPNN has smaller error values than LRM.

TEMPORAL RELIABILITY
In this section, the information of the year 2009 [22], presented in Table 7, is used to investigate the models' temporal reliability (prediction ability).The main purpose of this part is to study whether the generated models based on the first dataset (2008) can be appropriately applied to predict the new dataset (2009) or not.2008) is tested for prediction of the second information (2009).In fact, BPNN will not be trained with these new observations.Furthermore, in order to evaluate FRA prediction ability for the new dataset, the same constants a0, a1, b0, c0 and c1, and the same presented linear equation for estimation of m t are used.
As a matter of fact, the modelling process can be divided into two parts: 1 -generating models and evaluating their fitting ability using the first dataset (2008), and 2 -evaluating the generated models' prediction ability (temporal reliability) based on the second dataset (2009).All comparisons are performed by calculating the error values.Figure 7 shows the final prediction ability of the models for the second dataset.
Table 8 represents the models prediction errors based on the information of the year 2009.
The comparison between Tables 6 and 8 indicates that error values of the models have increased slightly.However, it is obvious that models show still appropriate temporal reliability.Moreover, FRA has higher prediction performance than BPNN and LRM.This proves higher accuracy of FRA in both fitting and prediction.Since the road freight transportation models have wide usage in long-term management and macro-level planning, even a slight improvement in RFT prediction will cause more confident policy-making.Furthermore, Table 9 shows that the generated FRA model can accurately predict the existing and future data sets.The values in this table are extracted from Tables 4, 5 and 7.For example, Esfahan and Fars provinces have close POP values and different RFT values.Since LRM and BPNN models cannot identify this noise, their predictions for both provinces are almost the same values.However, FRA can predict accurately RFT values of the year 2009 as well as the year 2008.This is the main superiority of the intelligent FRA in comparison with other conventional models.
It is noteworthy to mention that in macro-level and large-scale prediction models, observations' variations from one year to the next year are generally low values.The models should be continuously re-calibrated (updated) for the following years.

CONCLUSION
The accurate estimation of RFT between provinces of a country is of great importance in macrolevel management and planning.Attracted road freight transportation has been studied as RFT in this research.The first part of the studied dataset, containing information of all 30 Iranian provinces in the year 2008 is used to develop models and evaluate their fitting abilities.To determine the significance of independent variables, the relationship between RFT and population, area, number of cities, number of villages, average cost of freight transportation, aver-age distance of freight transportation, GDP and average household income of each province is analyzed through computing Pearson correlation.The results show that RFT has a close relationship with the POP of the province.
As result, POP is applied as the main independent variable in the modelling process.LRM is calibrated considering RFT and POP as dependent and independent variables, respectively.Afterwards, the intelligent modified FRA is generated based on the LRM.Furthermore, BPNN is trained using the first part of the dataset.The fitting ability of the models is evaluated by computing the error values.Modified FRA estimates RFT values with higher flexibility in comparison with BPNN and LRM, so it has more fitting ability.The proposed modification procedure helps FRA to determine the index of optimism values more realistically.
It should be mentioned that the generated models are studied to clarify whether they have appropriate temporal reliability in prediction of the second part of information (2009) or not.Computations prove that FRA has higher prediction ability and temporal reliability than both BPNN and LRM.Therefore, the proposed intelligent FRA is more effective than LRM and BPNN in both fitting and prediction.Since a slight improvement in RFT prediction will cause more confident policy-making in long-term planning and large-scale management, this intelligent fuzzy regression method is suggested as a powerful tool to analyze and model road freight transportation.

4. 3
Step 3: Developing FRA In this step, FRA is generated using the calibrated LRM equation.According to Equations 9 and 10:

t
values are shown in Table4 .

Figure 4 and
Figure 4 and Table 4 clarify that the values of RFT for all 30 provinces are in the interval , L U . .0 5 0 5^h.Therefore, m values for all observations can be calculated as follows:

Figure 4 -
Figure 4 -The estimated boundaries by FRA with = 0, LRM predictions and RFT real values a

6 BPNNFigure 6 -
Figure 6 -Comparison FRA and BPNN's fitting ability Error E between the calculated value Opk and the desired value Tk is defined as:

Table 1 -
Correlation between road freight transportation and studied independent variables

Table 2 -
Statistical characteristics of road freight transportation and population variables

Table 3 -
Linear regression model

Table 4 -
Provinces information, LRM prediction and FRA parameters

Table 5 -
Predicted values of LRM, FRA and BPNN

Table 6 -
Fitting error values* * These values are computed using the first dataset (for the year 2008) to compare models' fitting ability.

Table 7 -
Second information (for the year 2009) and models predictions

Table 9 -
High flexibility, fitting and prediction ability of the intelligent FRA .Najaf, S. Famili: Application of an Intelligent Fuzzy Regression Algorithm in Road Freight Transportation Modelling P