IMPLEMENTATION OF BIPOLAR ADJECTIVE PAIRS IN ANALYSIS OF URBAN ACOUSTIC ENVIRONMENTS

Four different acoustic environments with different loudness levels and spectral distributions were recorded and reproduced to two groups of listeners control group and experimental group. The questionnaire used in this research relies on the semantic differential method implemented by defining adjective pairs of opposite meaning where each pair describes a sound characteristic for a particular acoustic environment. In analyzing the results, psychological research methodology was used in order to determine statistically significant bipolar adjectives that can appropriately evaluate some given acoustic environment and thus serve as a starting point for a questionnaire and methodology standardization in soundscape research.


INTRODUCTION
Environmental acoustics is concerned with noise and vibration caused by traffic, aircraft, industrial equipment and recreational activities [1]. In addition, recent research [2][3][4][6][7][8] has shown a shift in focus on the positive use of sound in urban environments such as in analysis and synthesis of tranquil acoustic environment in residential areas [9,10] which improves general health and quality of life. Apart from urban planning, where it facilitates the identification and possibly reduction of undesired sounds, within the term soundscape research (i.e. characterization of environment by typical sounds it contains) there is research in sound parameterization of various animal species which improves speech recognition [11]. There is also research in various specific applications of knowledge of relevant objective and subjective parameters of soundscape in artificial sound synthesis for consumer electronics and robotics [12][13][14][15][16][17] or as a tool for the visually impaired [18].
Alongside objective acoustic parameters (spectrum, loudness, etc.) in researching environmental acoustics one also needs to investigate subjective parameters which can help determine the public perception of certain acoustic environment which could ultimately serve as a guideline for creating a pleasant acoustic environment or identification of a certain soundscape. The human perception of sound is still relatively little known and difficult to model precisely, while to achieve subjective acoustic environment characterization one typically needs to rely on the use of questionnaires and conduct multiple studies. One of the principal challenges is to find the set of parameters through which some acoustic environment can be uniquely and precisely characterized. Since a large number of people is typically involved in such research -mostly laymen in acoustical terms -the questionnaires need to be both comprehensive and clearly understandable to general population.
The basis for acoustical characterization can be achieved by using a set of subjective descriptors, such as bipolar adjectives that describe specific acoustic environment characteristics (e.g. loudness) and possibly a level of irritation or annoyance by a certain sound. These adjectives are subsequently assigned a numerical score which enables and facilitates the statistical analysis of specific sound properties [19][20][21][22][23].
Among researchers there is also no clear consensus yet regarding the soundscape terminology and choices of bipolar adjectives [19,20,[23][24][25][26]. In particular, the issues arise in the selection criteria for bipolar adjectives used, which is accompanied by the question of universal applicability due to loss of certain semantic nuances in translation and determining proper statistical approach. Thus, if statistical relevance of a certain adjective pair is not proven, it can easily be argued that a certain acoustic environment tested is not effectively described.

IMPLEMENTATION OF BIPOLAR ADJECTIVE PAIRS IN ANALYSIS OF URBAN ACOUSTIC ENVIRONMENTS
With respect to all of the above, the main focus of this paper is to define a statistically significant set of bipolar adjectives which could provide a basis for standardization in soundscape research regardless of the research hypothesis and to propose research methodology which leads to establishing statistically significant soundscape parameters. First, four scenarios are described, three of which are characteristic in urban areas, for which the subjective characterization is sought. This is followed by describing the research methodology which is then used to measure the response of volunteer listeners to soundscapes and to sudden changes in loudness. In the next part of the paper five bipolar adjectives are laid out, which are found to be statistically significant and their choice is discussed.

THE ANALYZED SOUNDSCAPE SCENARIOS
The four acoustic environment samples were chosen due to their relatively different characteristics in terms of loudness and frequency distribution. An average person living in an urban setting is familiar with three of them: the children's park, the expressway and the stream.
The soundscape samples were recorded with the soundwalk method [1][2][3][4], using an audio recorder and a pair of binaural microphones, with a 48 kHz sampling rate and a 16 bit quantization standard. The soundwalks were performed at different times of the day, different days of the week, always in a nice, dry and sunny weather. These recordings were performed at the soundwalker's height so that the patterns obtained would be as similar as possible to the natural binaural listening of people residing in these soundscapes.
The binaurally recorded acoustic environment recordings have been reproduced to listeners using closed electro-dynamic headphones with an average sound pressure level of 50 dB(A) in the steady part of the sample. Free field equalization was used.
We wanted to give the listeners sufficient time to adapt to a certain acoustic environment. Since our previous experience with exposing test subjects to similar sound stimuli for longer than ten minutes was proven to cause fatigue, we opted for a seven-minute recording which was deemed optimal for the listeners to adapt to a new sound environment while at the same time retaining attention [5].
The first acoustic environment recording was a children's park situated in a large housing block in the western part of the City of Zagreb, Croatia. From this recording we extracted a seven-minute sample which mainly included sounds the children were making, e.g. their cries, shouting, calls, as well as sounds coming from the playground equipment such as a swing, a carousel, etc. The greater part of the recording is of constant level with short and sudden changes in level corresponding to the children shouting and crying. Figure 1 shows a spectrogram of the recorded children's park environment excerpt. The spectrogram stretches relatively wide above 10 kHz, with very short leaps in level change which are up to 20 dB higher when compared to the steady part of the recording.  The second environment to be analyzed is the recording of the Ljubljana Avenue -an expressway stretching from the city centre (Zagreb, Croatia) to the western exit of the city, which is busy with traffic during most of the day. The recording sample includes sounds such as cars and trucks passing by, car horns, the sound of a pedestrian traffic signal for the visually impaired and other acoustic environment specific and related sounds. Figure 2 shows a spectrogram of the recorded expressway environment excerpt. The frequency spectrum is narrower compared to the children's park environment and is mainly concentrated at frequencies below 1 kHz. Sudden and large changes in level were not as unexpected for the expressway environment as they were in the case of the children's park. These sounds mainly correspond to loud cars or trucks passing by and can be seen on the spectrogram.  The third acoustic environment was chosen for its unfamiliarity to the majority of test listeners and is distinctively loud with artificial sounds of various tools, mostly power tools, such as grinders and drills, and hand-held tools such as hammers hitting metal objects. These specific sounds are very loud, as shown on the spectrogram ( Figure 3). The frequency spectrum is up to 10 kHz wide, with short level changes rising to 30 dB above the average level.  The fourth acoustic environment is a recording of a forest stream located in the city suburb of Zagreb, Croatia. This sample includes sounds of water flowing, children playing with a dog, and a bus passing nearby. Figure 4 shows the frequency spectrum of this recording excerpt. It is narrow and concentrated in the low frequencies making the sudden level changes (dog barking, children screaming etc.) very discernible.

METHODOLOGY FOR SUBJECTIVE RESEARCH
To evaluate the subjective perception of soundscape we proceed with the questionnaire design which is given to 100 volunteer listeners of the analyzed soundscape scenarios. All volunteers were mainly students from 19 to 24 years of age, with healthy hearing. Such research belongs to the domain of psychological research for which the typical methodology is to require certain conditions to be met in order to test some hypothesis without bias or interference. Only then, after statistical processing, may the analyzed hypothesis be proven or rejected [28][29][30].
It is important to emphasize the controlled aspects of testing i.e. the size of the test sample, testing conditions, manipulation of an independent variable and application of appropriate statistical procedures in tabulating and analyzing the data to name a few. The subjects participating in the study were divided into two groups: experimental and control. In order for the obtained results to be deemed statistically significant, the number of listeners participating in this research was one hundred and each group comprised fifty listeners. The average age was 24. The testing conditions were the same for both groups except for the stimulus the experimental group received in the form of sudden and unexpected loudness changes.
The order of acoustic environment recordings was chosen randomly, but it was fixed for both groups according to the established psychological methodology [27][28][29]. In this research, the recording of children's park acoustic environment was reproduced first, followed by an expressway, industrial hall and forest stream acoustic environment. The listeners were not familiar with the details of the research or the content of the recordings.
The test subjects listened to each of the four acoustic environment recordings with a time gap of at least one week in order to eliminate any potential influence of one recording on another and to avoid fatigue, irritation and annoyance of the listeners.

Questionnaire design
When designing a questionnaire, the researchers rely mostly on personal experience and draft their questions so that they target a specific research hypothesis. The questionnaire used in this research relies on the semantic differential method implemented by defining adjective pairs of opposite meaning where each pair describes a sound characteristic for a particular acoustic environment [19-23, 25, 26]. Ratings given by the listeners are between the two extremes of a certain adjective pair and the range between two extremes is defined with a discrete number of values.
In particular, the used questionnaire consists of eleven adjective pairs that can describe a certain acoustic environment in detail. The adjective pairs belong to two types-one represents the auditory perception of objective parameters that can be measured [23], while the others are purely emotional reaction ( Table 1). Nevertheless, the principal criterion for the choice of adjective pairs is their capability to be comprehensible to the wider audience. Thus, they are further treated in equal way regardless of the type they belong to since the objective is to find the unique soundscape representation within the available attributes. In the used questionnaire there is a seven-point semantic differential scale between each adjective pair. The number of points on the scale allows for reasonable continuous-like approximation of the human reaction to the listened sound. Following the methodology discussed in the subsequent section, primarily the controlled aspects of the research and the application of appropriate statistical methods, it is possible to establish statistical relevance of specific pairs of adjectives.

Processing -experimental and control group
The experimental group (N exp =50) listened to the acoustic environment recordings with sudden and short loudness changes, i.e. without any post-processing modifications. These experimental group acoustic environment recordings are characterized by a wider loudness distribution where during several short time intervals, loudness increased up to 12 sones.
The control group (N cont =50) listened to the same acoustic environment recordings but with a narrower loudness distribution over the entire length of the recordings. This was achieved through dynamic post-processing. For the control group, all acoustic environment recordings were passed through a compressor in order to lower the maximum values of loudness; however, the average loudness did not change significantly.
The two groups listening to the same acoustic environment recording enabled us to identify one objective parameter, namely, loudness. We wanted to determine whether individual sound events in the same acoustic environment sample would be perceived by the control group of listeners as annoying if their loudness was of a lower value. Loudness was calculated using the established psychoacoustic Zwicker method according to the standard DIN 45631 [30,31] and is given in Table 2. In order to better perceive how loud the environments were, the reader should take a rule of thumb that 1 sone corresponds to a loudness of 40 phones, and 64 sones to a loudness of 100 phones. If we transfer this to sound pressure level, 50 phones, corresponds to 50 dB at 1 kHz.

ANALYSIS AND DISCUSSION OF THE RESULTS
Firstly we have performed direct analysis of the obtained listeners' responses based on the descriptive marks. For each considered acoustic environment and test group (i.e. group of 50 listeners) we have calculated the mean value of response x , standard deviation of the sample (σ) and its square, i.e. the variance (σ 2 ). The details on responses can be reconstructed from Table 3, while mean values are given in Figures  5 and 6. The values on x-axis designate respective bipolar adjective pair as listed in Table 1, while the y-axis shows the average listeners' perception of sound characteristics between two adjectives (i.e. rating "1" and "7" correspond to purely "left" and "right" attribute in Table 1.) By visual inspection it can be seen that most correspondence among adjective pairs between control and experimental group is obtained for the case of the children's park ( Figure 5). This can be explained by random short-time high-frequency peaks (due to children shouting and crying) where the reduction of loudness level apparently exhibited the least effect on the soundscape impression. It can also be noted that the forest stream environment recording has generally been rated as the most comfortable and the most soothing, unlike the recording of the industrial hall environment. The sudden changes of loudness in the forest stream in the experimental group largely contribute to the perception of sound diversity and between two groups (adjective pair No. 3 in Figure 6) which implies that in generally calm environments unexpected changes in loudness are felt by public as a change in diversity while the overall level of soundscape appeal has not largely deteriorated (e.g. pairs "4", "5" and "11" in Figure 6).     The calculated variances for the four analyzed acoustic scenarios are given in Figures 7 and 8. It can be seen that variances are for most cases below 2.5 (corresponding to standard deviation around 1.6), which means that the 7-level differential scale can be considered fine enough to collect most listeners' responses and focus them around mean values. The most notable exception to the observed variance level is observed for the adjective pair No. 6 for the case of experimental group at the expressway (Figure 7). This means that sudden changes of sound level in such environment blur the perception no matter whether the cause of sudden change is natural (due to car passing etc.) or artificial (e.g. by some post-processing).

The t-test
As noted above by direct analysis, the differences in average ratings and variances unquestionably exist between the control and experimental group for each pair of bipolar adjectives, but certain pairs of adjectives yielded the largest differences. To analyze them in firmer mathematical terms we proceed to analysis of the statistical significance of the observed differences. For finding statistically significant bipolar adjective pair we have used the so-called t-test and subsequently ANOVA (Analysis of the Variance) [32,33].
The experimental research methodology postulates the manipulation of one independent variable, in this case -sudden and unexpected changes in loudness, while other variables are kept constant for both the experimental group and the control group [20,25,26]. The two groups represent independent samples on which statistical significance can be established. When using a two-sample t-test for evaluation of the difference between means of small independent samples it is necessary to specify the level of significance α and to determine the degrees of freedom (N-1), where N is defined as the size of sample (as is common in similar statistical analyses we set α=0.05 which corresponds to 95% certainty). The results of t-test applied to the listeners' responses on each environment for each pair of adjectives are given in Figure 9. The socalled t-value which is the core of the t-test can be regarded as statistical equivalent of signal-to-noise ratio [33] and is calculated as: where X , σ 2 and N denote mean values of responses, variance and number of samples, respectively, while indices "exp" and "cont" refer to experimental and control group, respectively.  Using standard table of significance [32] it is found that the postulated significance level of α=0.05 corresponds to the demand for t-value to be greater than 2.01 in order for yield statistically significant difference between the mean values obtained for control and experimental group. The calculated t-values for all analyzed scenarios are given in Figure 9, where the bipolar adjectives on the x-axis are numbered as in Table 1. For comparison the value of t=2.01, which is the significance threshold, is also drawn in the same figure. It can readily be seen that the pairs numbered as "3", "4", "9", "10" and "11" exhibit statistically significant value in all of the considered environments (details are given in Table 4). Note that for some environments (e.g. forest stream and expressway) there are some more statistically significant pairs which means that the used method can be specialized to some particular environments to characterize them in more details. The smallest number of statistically significant pairs is observed for the case of the children's park, which is in accordance to remarks from direct analysis of the mean values responses of control and experimental group performed in previous section ( Figure 5). We also note that there are three adjective pairs for which statistical significance in mean values is not found at all (i.e. quiet -loud, deep -high-pitched, harmoniouschaotic), two of them clearly belong to auditory group ( Table 1). This means that the objective parameters which have not changed between two groups (average loudness level and frequency spectrum) are rather unambiguously sensed by both experimental and control group (see also Figures 5 and 6 for comparison).

The analysis of variance (ANOVA)
In the final test we check whether the choice of bipolar adjectives themselves (from Table 1) is suitable for the characterization of some acoustic environment, i.e. whether different environments would give rise to statistically significant listeners' response in terms of bipolar adjectives ranks. To characterize the bipolar adjective pairs we have performed the method of oneway analysis of variance (ANOVA) [32] for both control and experimental group independently. Pertinent to the ANOVA method is the calculation of the so-called F-value, which is defined as the ratio of normalized variance between sequences and sum of normalized variances of each sequence (each sequence represent respective acoustic environment). It is calculated for each adjective pair as: N tot -total number of samples.
The calculated F-values for each adjective pair for both groups are shown in Table 5. The inserted values are m=4; N i =50; N tot =200 (other values have been calculated from listeners' responses and can be deduced from  or from Table 3).
The already set level of significance α=0.05 (95% certainty) is equivalent to demand for F-value to be larger than 3.68 [32]. This demand is fulfilled for all the adjective pairs in both groups, so each same adjective pair indeed can describe four environments in statistically significant way. A more detailed post-hoc analysis shows that for the control group the biggest statistical difference (i.e. the largest F-value) among the acoustic environments is achieved for the bipolar adjectives numbered as "6", "7" and "8" (i.e. natural -artificial, harmonious -chaotic and appealing -repulsive). As for the experimental group, the largest difference among the acoustic environments is obtained for adjective pairs numbered "9" and "11" (soothing -stressful and gentle -rough). On the other hand, the smallest statistical difference among the acoustic environments for both the control and experimental group is obtained for adjective pair No. 1 (quiet -loud), which is expected as the average sound pressure level was equal for all four acoustic environments, hence they were perceived as equally silent or loud (see also Figures 5 and 6 for comparison). Nevertheless, notwithstanding the higher or lower F-value, for each adjective pair ANOVA analysis shows that the acoustic environments are significantly different for

CONCLUSION
In this paper four pre-recorded acoustic environments which typically occur in contemporary urban areas have been investigated both in objective and subjective terms. The objective differences in the studied environments are loudness and spectral distributions, while subjective parameters are determined by examining the responses of two groups of listeners to sudden and short loudness changes. The results were obtained by processing the questionnaires which relied on the semantic differential method implemented by defining adjective pairs of opposite meaning where each pair described a sound characteristic for a particular acoustic environment. The questionnaire comprised eleven adjective pairs which uniquely described particular sound environments.
Following the psychological research methodology based on controlled aspects of testing (i.e. the size of the test sample, testing conditions, manipulation of an independent variable etc.) we have established statistical relevance for five adjective pairs in all scenarios, namely: diverse -monotonous, pleasant -unpleasant, soothing -stressful, conspicuous -inconspicuous and gentle -rough.
This means that these five attributes can be used as a basis for unique subjective characterization of urban acoustic environments. The identification of urban acoustic environments and understanding human response to various environments gives rise to applications in implementation of pleasant acoustic environments in residential and commercial areas by reducing undesired sounds and amplifying the desired ones.
Furthermore, the subjective analysis based on bipolar adjectives can be extended to characterization of specific soundscape (e.g. hospital, office, stadium...) by simply adding new attributes and processing them in line with the proposed methodology. The proposed methodology can contribute to standardization and systematization of soundscape research while the proposed study can be combined with similar studies to build a psychoacoustic model of human perception of sound as well as to establish the knowledge base of human-sound interaction for specific purposes such as sound therapy or computer synthesis of soundscape.

ACKNOWLEDGEMENT
This work was supported by the Ministry of Science, Education and Sports of the Republic of Croatia.