Abstract

For the improvement of the traditional evaluation effect of the automobile sound quality, an evaluation model of automobile sound quality is constructed based on BP neural network. The first is to introduce the basic principle of the BP neural network in detail. The second is to use the MGC parameters to establish the vehicle interior sound conversion model. The converted sound characteristic parameters are taken into the WORLD model to synthesize the new sound signals. Furthermore, the wavelet decomposition method is used to remove noise from the synthesized sound signals. Finally, a sound evaluation model based on BP neural network is established. The sound quality of automobiles can be better evaluated by carrying out the ABX test and MOS test in the field of sound conversion. For the newly synthesized sound signal and the target sound signal, it can be seen that the newly synthesized sound signal is more inclined to the target sound signal, and the sound quality is better. In addition, the sound quality is tested through loudness, roughness, sharpness, and level A in the field of sound quality evaluation. The final results show that the quality of newly synthesized sound is better, and the average errors of sound signals meet the sound standard. Therefore, the constructed sound conversion model and the sound evaluation model are feasible and effective.

As science and technology continuously develop, people pay more and more attention to the performance of automobiles, so the noise problem has become one of the important indicators of automobile purchase. The noise of automobiles seriously affects the comfort of the person when riding in the vehicle, also interferes with the communication between people, and even seriously damages the auditory characteristics of human ears. At the same time, vibration and noise are also very different in the automobile sound. Therefore, how to better distinguish vibration and noise, and evaluate the automobile noise, has become the focus of current research. In practice, companies such as Nissan, FEV of Germany, and AVL of Austria are all studying the interior sound quality of electric vehicles. The most influential noise sources and special frequencies have been found. Furthermore, some active or passive technologies are used to eliminate noise and compensate for frequency, which achieves good results.

In the academic field, Xie et al. proposed an adaptive neural network sound evaluation method. The characteristics of the proposed method are simple and fast. At the same time, they introduced a mainstream neural network algorithm [1]. Yin et al. calculated the dither noise of automobile side windows by using the large eddy simulation method. As can be seen that the variation rules of loudness, sharpness, roughness, and undulation with wind speed and window opening are obtained [2]. Yang et al. adopted binaural transfer path analysis (BTPA) to measure the vibration and noise transmission paths of automobiles under transient and steady-state conditions. Then, the characteristics and differences of the internal noise of automobiles under different operating conditions are studied, where the loudness, sharpness, roughness, and A-weighted sound pressure level are adopted [3]. Park and Kang established a sound quality index model, which reflects the reviewers’ different styles. Also, the model is constructed by utilizing K-means clustering, factor analysis, and multiple linear regression. This study provides an additional reference for the evaluation of sound quality [4]. Zhao et al. proposed a DBN based on linear regression (LR-DBN). There are 6 psychoacoustic indicators and 26 Mel frequency cepstrum coefficients being taken as input characteristics. At the same time, the ordinary DBN, multiple linear regression (MLR), and backpropagation neural network (BPNN) are used to verify the performance of LR-DBN. What can be seen is that compared with other methods, the correlation coefficient of LR-DBN is higher, and the prediction error is lower. Furthermore, LR-DBN is more stable. So, this method is a reliable method for evaluating EEV sound [5]. Wang et al. proposed an objective evaluation method of interior noise, which is based on the displacement of the human basal membrane [6]. Firstly, noise samples of different seats are collected under different running conditions. Secondly, the comparison method of adaptive grouping pairs is used to obtain the subjective evaluation value of noise samples. Thirdly, the total parameter model of the human ear is adopted to calculate the average value for the basal membrane displacement response (SMVBMDR). So, the characteristic matrix based on SMVBMDR is established. Finally, there are two BP neural network models constructed to evaluate the interior noise sound quality respectively. Also, they took the traditional psychoacoustic indicators and extracted the feature matrix as the input. The SMVBDMDR is very correlated with SEV. Moreover, the sound quality prediction accuracy of the model based on SMVBMDR is higher.

It can be seen from the above research that the qualitative and quantitative methods are used for the present sound quality evaluation. Also, machine learning algorithms, deep learning algorithms, and other algorithms are introduced. However, the above sound evaluation rarely involves the preprocessing of noise data. In this paper, it is proposed that on the basis of parameter extraction of automobile sound feature extraction, BP is used to synthesize automobile sound, and then evaluate sound according to sound quality evaluation parameters so as to provide a new reference for automobile sound quality evaluation.

2. BP Neural Network

2.1. Introduction

BP neural network is a multilayer feed-forward neural network. It is trained by error backpropagation. In addition, the characteristics of the BP neural network are that the model is clear and the structure is simple. The basic ideas for training the BP neural network model are as follows [7]:

Herein, training sample is defined as , where x represents the input vector, y represents the output vector, and n represents the number of training samples. Supposing that layer l − 1 and layer l contain m and k nodes, respectively, the output of the j th node of layer l can be expressed as formula (1), and the vector composed of the output of layer l can be expressed as formula (2).

Among them, and are the weight and bias of node j from l − 1 to l layer, respectively; f (z) is the activation function; represents k × m matrix composed of the lth layer weight; represents k × 1 vector composed of the bias outputs of the l layer.

As can be seen from the above formula that the output vector of each layer in the network can be calculated. Then, the loss function is used to find the appropriate weight matrix and bias vector corresponding to all hidden layers and output layers. The gradient descent method is adopted to continuously update the weight matrix and bias vector. Thus, the best weight matrix and bias vector are obtained. Finally, the optimal weight matrix and bias vector are utilized to predict the predicted value which is closest to the actual value. According to Pan et al. [8], the mean square error function is selected as the loss function. The expressions are as follows:where and are the target output and actual output vector of the network respectively, and is the number of test samples.

The structure of the BP neural network is shown in Figure 1 [911]. It includes the input layer, the hidden layer, and the output layer. Neurons at all levels of the network are connected by the way of a full connection layer. The hidden layer is in the middle. It includes multiple layers, but there is no connection relationship between neurons at the same level.

2.2. BP Neural Network Structure

In BP neural network, the number of neurons per layer is associated with the actual number of input and output data. In the hidden layer, too large or too small neurons may prolong the training time. It cannot be fitted well. Therefore, continuous search and testing are required to determine the optimal neurons, and available formulas are expressed as

Here, , , and are the neurons in the hidden layer, input layer, and output layer, respectively; a is a constant, which ranges from 1 to 10.

The input vector and output vector in the BP neural network should be normalized to the range −1.0∼1.0 so as to effectively avoid the overfitting of output vectors [1214]. The calculation formulas are as follows:

The abovementioned formulas show that is the data to be normalized; is the minimum value; is the maximum value; represents the average value; and is the normalized data.

3. Construction of Sound Conversion Model Based on BP Neural Network

3.1. BP Neural Network Sound Conversion Model Based on MGC Parameters

On the basis of the BP neural network, sound characteristic parameters are extracted from the original sound signal through the world sound analysis synthesis system, including one-dimension lf0, 60-dimensional MGC parameters, and 5-dimensional bap parameters. Considering that the above three parameters are obtained through the world sound synthesis system and is consistent with the numerical changes of bap parameters, only the 60-dimensional spectrum envelope MGC parameters are performed in modeling training. The BP neural network sound conversion model based on MGC parameters is constructed in this paper, as shown in Figure 2. The neuron number of input and output is 60, the hidden layers are 2, and the neurons of hidden layers is 59 [1518]. Parameters select the target sound signal value and then place the generated parameters into the WORLD system to synthesize a new sound signal.

Among them, a set of sound signals has 60-dimensional MGC parameters, and one dimension has 117 data, which is represented as . Here, 45 groups of portable and artificial head data are selected for training, and 5 groups of data are randomly selected as test sets. As the large amount of training data, the parameters are divided into input vector and output vector. For the input vector, there are 45 MGC parameters collected by the portable. For output vector, there are 45 MGC parameters collected by the artificial head, and the amount of data is large.

Therefore, all of the neurons are 60. In addition, the learning rate of the model algorithm is 0.01; the number of maximum errors is 10; the number of maximum training is 10000, and the accuracy of learning and training is 0.001.

After the transformation model is established, new MGC parameters can be obtained for synthesizing new sounds so as to provide sound parameters for the establishment of subsequent sound evaluation model.

3.2. Sound Signal Denoising Based on Wavelet Decomposition

In the extraction and synthesis of sound features, the algorithm calculation leads to the sound spectrum decline, frequency offset, resulting in a large amount of noise. These noises will seriously interfere with the quality and evaluation effect of sound signals. Therefore, it is necessary to denoise the transformed and synthesized sound. In this paper, the most widely used wavelet denoising method is adopted. The flow of this method is as follows:(1)Decompose the original sound signal s(n) by wavelet. So, the real target sound signal a(n) and noise signal d(n) can be separated;(2)Calculate the wavelet coefficients of real sound signals, and remove the wavelet coefficients of noise signals;(3)Adopt the inverse transformation method to obtain the new sound signal after wavelet transform so as to complete the denoising of the sound signal.

The formula of an original sound signal with noise signal is

The formula (6) shows that is the original sound signal with noise; is the normal target sound signal; is the unwanted noise signal. Furthermore, is subjected to distribution, while is the nonstationary signal subjecting to non-Gaussian distribution.

4. Experimental Results and Analysis

After the sound conversion and synthesis, the sound evaluation model based on the field of sound conversion and the automobile sound quality will be established so as to evaluate the new synthetic sound signal better. Then, the subjective evaluation was performed for the field of sound conversion, and the objective parameters of the field of automotive sound quality were used to evaluate the synthetic sound signal.

4.1. Evaluation in the Field of Sound Conversion
4.1.1. Experimental Environment and Data

To achieve better experimental results, this paper selects a meeting room with a strong sound insulation effect in a university for testing, with an indoor temperature of 22°C and the humidity of 45%. Also, the better sound signal can be obtained by using a high-fidelity headphone for sound playback Sennheiser HD650.

The experimental data were derived from 5 healthy and normal hearing postgraduates in a university, including 3 boys and 2 girls.

4.1.2. Test Method

The ABX test and MOS test are adopted to evaluate synthetic sound signals so as to ensure the accuracy of subjective evaluation in the field of sound conversion.

The ABX test is a common method for subjective evaluation of sound conversion. A and B represent the original and target sound signals, respectively, and X represents the converted sound signal. The testing process is to distinguish the similarity between the converted sound signal with the original or target sound signal by different people’s subjective auditory feelings. Finally, the probability statistics are used to obtain the ABX score of the system so as to evaluate the conversion performance effect of the system.

ABX calculation process is expressed as [1921]:where represents the number of testers and represents the test result. When value is 0, it indicates that the converted sound signal is more similar to the original sound signal. When the value is 1, it indicates that the converted sound signal is more similar to the target sound signal.

The MOS test is also known as average opinion score test. Its main function is to evaluate the tester’s overall satisfaction with the converted sound signal. If the MOS test score is high, it indicates that the converted sound signal is up to the standard, and its naturalness and intelligibility are better. The expression is as follows:

Here, M represents the total participants in evaluation, N represents the sound signals in the test, and represents the evaluation score of the m-th individual on the nth sound signal.

When conducting subjective evaluation experiments, it is necessary to be in a room with a better environment. The temperature of the selected automobile is 22°C, and the humidity is about 45%. In addition, the sound playback selects high-fidelity headphones to obtain a better sound signal. The sound signals evaluated by the test are mainly four sets of signals obtained through the world sound synthesis system, and the four sets of sound signals correspond to the operating speed of an automobile at 60 km/h, 100 km/h, 30 km/h, and 80 km/h. The four groups correspond to the numbers 1 to 4 in Tables 1 and 2.

The specific evaluation effect is as follows:(1)ABX testThe results of ABX are shown in Table 1.According to the above table above, the highest ABX score of the fourth group, reaching 85%, indicates the best effect of this experiment, the lowest ABX score, only 63%; the main reason may be due to too little experimental data. Therefore, the next step is to further expand the amount of data properly. However, comprehensive analysis found that the ABX test score averaged 74%, reaching the experimental standards, with preliminary proof that the newly synthesized sound signal meets the experimental requirements.(2)MOS testMOS test results are shown in Table 2.

According to the table, of the four sound signals, the first MOS test score was 4.2, indicating that the experiment was good; the first MOS test score compared with the other three, the lowest score reached only 3, the main reason may be still that the number of data sets is relatively small. The comprehensive analysis shows that the MOS test score was averaged into 3.65, which met the experimental requirements, and preliminarily proved that the newly synthesized sound signal met the experimental requirements. The scoring standards of the MOS test are shown in Table 3.

4.2. Evaluation in the Field of Automobile Sound Quality

First of all, the synthetic sound signal is evaluated by using the objective parameters such as SPL, roughness, loudness, and sharpness [2224].

Furthermore, the error evaluation is performed between the five groups of synthetic sound signals with the corresponding five groups of portable and five groups of artificial head sound signals. Also, the objective parameters are calculated by LMS Test.Lab.

4.2.1. Roughness

Roughness is a parameter that shows the modulation degree of the sound signal. The unit of roughness value is asper. When the roughness is 1 asper, the sound signal is a sinusoidal pure tone signal with a SPL of 60 dB. Also, beyond that, the frequency is 1 kHz, the modulation amplitude is 1, and the modulation frequency is 70 Hz. It can be seen that the roughness calculation formula is [25]

Formula (9) shows that represents the modulation frequency, and the unit is kHz. Here, G represents the change value of the excitation stage of a sound signal, expressed aswhere z represents the critical band Bark number and and represent the maximum and minimum values of the specified loudness in the feature frequency band for a sound signal, respectively.

As can be seen from the above Figure 3, B, R, and X represent portable, artificial head, and newly synthesized sound signals, respectively. Here, the newly synthesized sound signals were close to the artificial head signals. Using LMS Test.Lab to calculate the RMS value, which is found that the portable, artificial head and newly synthesized RMS values are 0.00995 asper, 0.00530 asper, and 0.00494 asper, respectively. The difference between portable and artificial head is 87.74%, while the difference between new synthesized and artificial head is 6.79%. Therefore, the roughness of the newly synthesized sound signal is obviously improved and meets the experimental requirements.

4.2.2. Loudness

Loudness is a parameter proposed to show how the human ear feels about the strength of the sound signal. The magnitude of loudness is determined by the size of the original sound amplitude, which is also associated with the frequency size. The unit of loudness value is expressed as song (sone). When the loudness value of the sound signal is 1 sone, which means it is a pure sound signal with a SPL of 40 dB, and the frequency size is 1 kHz.

The Zwicker algorithm is commonly used to calculate the loudness, which is expressed as

Formula (13) shows that represents the excitation generated by the listening valve when it is relatively quiet, represents the corresponding excitation under the reference sound intensity , and E represents the corresponding excitation of the sound signal calculated by test. At the characteristic frequency band Bark from 0 to 24, performing the integral operation for the feature loudness to obtain the value of the total loudness, and the expression is as follows:

Here, N is the total loudness, namely the calculation model of steady-state sound signal loudness.

As can be seen from Figure 4, B, R, and X represent portable, artificial head, and newly synthesized sound signals, respectively. Here, the newly synthesized sound signals are between the portable signals and the artificial head signals. Also, using LMS Test.Lab to calculate the RMS value finds that the portable, artificial head and newly synthesized RMS values are 0.63 sone, 1.24 sone, and 0.95 sone, respectively. In addition, the error of portable and artificial head is 49.19%, and the error of new synthesized and the artificial head is 23.39%, indicating that the loudness of the new synthesized sound signal is significantly improved and reaches the experimental standard.

4.2.3. Sound Pressure Level (SPL)

The sound pressure level is one of the important methods to express subjective feelings of loudness. In ANSI S1.8 (1989) and ANSI S1.13 (1995), the sound pressure level is calculated as

The formula (13) shows that in SPL, represents the effective sound pressure value for testing sound signals, and represents the sound pressure size, that can be heard by human ears in a stable condition, whose sound signal is 1 KHZ. In addition, this sound pressure value is also the audible threshold value. The reference sound pressure refers to the minimum root mean square sound pressure. In other words, at standard atmospheric pressure, and when , the subjective perception of sound intensity by human ear is not very closely related to the sound pressure level itself. If the loudness increases, the sound pressure level will increase in a logarithmic speed. The expression of Stevens extracting the power relation of both iswhere L represents the perceived loudness of the ear, k represents the coefficient of each subject, and I represents sound intensity.

As can be seen from the comparison diagram, B, R, and X also represent portable, artificial head, and newly synthesized sound signals, respectively. It can be found from Figure 5 that the new synthesis is between the portable signals and the artificial head signals. The RMS values of the three are 47.86 Pa, 45.32 Pa, and 44.98 Pa, respectively. The error of portable and artificial head is 5.60%, and the error of new synthesis and human foreman is 0.75%. It indicates that the A sound level of the newly synthesized sound signal is further improved to meet the requirements.

4.2.4. Sharpness

Sharpness is an objective parameter. It emphasizes the sharpness of sound signal and mainly shows the proportion of high-frequency signal in the total sound spectrum. In a sound signal, the larger the proportion of high-frequency components, the greater the loudness value, and the greater the sharpness of this sound signal.

The unit of sharpness value is acum, and the sound signals with a sharpness of 1 acum are a narrow band noise signal with a sound pressure level 60 dB. The center frequency size is 1 kHz, and the bandwidth size is 160 Hz. The solution method of sharpness is to conduct the weighted integral calculation for the total loudness and the spectrum response of the critical frequency band, and the expression formula is

Among them, represents the critical frequency bands number; represents the characteristic loudness function and represents the weighting function when the critical frequency band of sound is high. Its numerical change is mainly affected by the critical frequency band.

As can be seen from the sharpness comparison in Figure 6, the newly synthesized sound signal is adjacent to the artificial head signal, and the RMS value can be calculated by LMS Test.Lab. The portable, artificial head and newly synthesized RMS values are 6.54 acum, 2.47 acum, and 2.47 acum, respectively. The error of portable and artificial head is 43.32%, while the error of new synthesized and artificial head is 0.00%, indicating that the sharpness of the new synthesized sound signal is further improved and reaches the experimental standard.

The above evaluation is based on a single index to evaluate the sound inside the automobile, which is obviously not objective enough. On the basis of single index evaluation, a comprehensive evaluation method is proposed to evaluate the new synthetic automobile sound. As the units of above indicators are different and belong to different levels, the following ideas are adopted in the comprehensive evaluation: Firstly, 32 different evaluators are invited to evaluate the new synthetic interior sound signals according to the grade standards in Table 4 and scored them one by one so as to obtain the overall evaluation of new synthetic sound.

At the same time, the test is conducted in a closed environment to avoid interference from the external environment. The number of tests is 10, and the final results are averaged. Based on the abovementioned tests, the obtained results are shown in Table 5.

As can be seen from the above scores, the overall average score of the four indicators is 8, which belongs to the great level. It shows that the synthesized sound can be accepted by the evaluator and get a better evaluation.

5. Conclusion

To sum up, the sound quality evaluation model based on BP neural network constructed in this paper is feasible. The model can effectively evaluate the sound signals of new synthetic, portable, and artificial heads. Also, after testing through subjective evaluation and objective parameters, it is found that the newly synthesized sound signal is very close to the target sound signal, with an average error of less than 10%. As can be seen that in the field of sound quality evaluation, the newly synthesized sound is up to the design standards of this paper. However, due to the limitation of conditions, there are still some shortcomings. The main reason is that there are too few data sets elected in the conversion model, which results in that the final test data may have a low monitoring score. It provides a more comprehensive evaluation method for the research and synthesis of automobile sound.

Data Availability

The experimental data are available from the corresponding author.

Conflicts of Interest

The authors declare that they have no conflicts of interest regarding this work.