Abstract

The study of pragmatic failure in cross-cultural communication is a subject of great theoretical significance and practical value in contemporary linguistic research. This paper takes pragmatic failures in cross-cultural communication as the research object, and tries to make the discussion systematic, theoretical, and scientific. With this feature, the complexity of the convolutional network will be greatly reduced. In the convolution layer, the convolution operation can make the features of the initial speech more obvious, and it can also have certain effect on the noise reduction of speech. It is of great theoretical and practical significance to use convolutional neural network to study the intelligent analysis strategies of pragmatic failure in cross-cultural communication. In this paper, the intelligent analysis strategy of pragmatic failure in cross-cultural communication based on deep convolution neural network is taken as the research object, and an optimized end-to-end deep convolution neural network model is proposed. Experimental results show that the overall recognition rate of this algorithm is improved by 59.8%. Especially, the efficiency of obtaining results is basically maintained at 61.8%. The intelligent analysis strategy of pragmatic failure in cross-cultural communication based on convolutional neural network reduces redundant calculation and shortens training time to some extent, and this algorithm can better reflect the advantages of accelerating network convergence compared with simple network.

1. Introduction

Cross-cultural communication has been paid more and more attention by scholars at home and abroad. With the development of the world economy and the increasing number of cross-border communication, pragmatic problems in cross-cultural communication have attracted great attention of scholars [1]. Cross-cultural communication refers to the communication between people from different cultural backgrounds. From the perspective of psychology, the coding and decoding of information are carried out by people from different cultural backgrounds, which is cross-cultural communication [2]. Intercultural communication not only refers to the communication between different groups of people in the same country but also includes the communication between people in different countries [3]. From the perspective of translation methods, it also includes translation and interpretation. Due to the different requirements between translation and interpretation, the time used for translation and translation standards are also different [4]. Structuralist linguistics has made an indelible contribution to revealing the internal organizational rules of language system and the interaction and influence among subsystems that make up the language system, but this structuralist research has cut off the inevitable connection between language and society [5]. We know that language is a social phenomenon. Its sociality determines that the language system is not a closed system. It cannot develop independently in a vacuum but is inextricably linked with other systems [6]. People’s language expressions are always restricted by various social and cultural factors, and the phonetic, vocabulary, grammar, and pragmatics, of the internal elements of the language system are all intricately linked with social and cultural factors.

General pragmatic failure can be divided into three categories: pragmalinguistic failure, sociopragmatic failure, and pragmatic failures in nonverbal communication. Pragmalinguistic failure is usually caused by the fact that the target language used by foreign language learners does not conform to the language habits of native speakers of the target language or adopts the expression of their mother tongue [7]. When using language, language users do not use appropriate and correct sentences to accurately express their thoughts according to the specific semantics, that is, words that convey the extraverbal force that they do not intend to convey [8]. Sociopragmatic failure pays attention to the failure of the communicative function of language and attaches great importance to the failure of the relationship between language and social culture. Speech activities are always carried out in a certain language background and a broad social background. In addition to language knowledge, a person’s language communication ability should also be including the adaptability of language users to pragmatic principles and social and cultural backgrounds [9]. The pragmatic failure in nonverbal communication is an attribute or an action of a person that is known to society without words. These attributes and actions are sent or considered to be sent purposefully by the sender and are received consciously by the receiver with the possibility of feedback.

The model of deep learning is based on deep neural network and imitates the mechanism of human brain to solve various problems in machine learning [10]. Similar to traditional machine learning methods, deep learning methods are also divided into supervised learning and unsupervised learning [11]. For example, convolutional neural networks are supervised learning models, while autoencoders are unsupervised learning models. Generally, when using unsupervised models for learning, a classifier is usually added to the top of the model to solve practical problems [12]. Convolutional neural network can be applied to emotion classification. With the development of deep learning in natural language processing, deep learning has been widely used in subtask emotion classification of natural language processing. Convolutional neural network has got a good response in speech feature recognition [13]. As a new extension of machine learning research, deep learning has made great breakthroughs in speech and image recognition. Because CNN does not need feature extraction in the preprocessing stage, it avoids the influence of artificially designed filters on the quality of speech signals. Therefore, the training weights of CNN’s feature detection layer realize implicit learning and avoid explicit feature extraction. Because of the average eigenvalue of the pool layer, CNN has recognition robustness to distortion invariant 2D graphics such as displacement and scaling. In terms of continuous speech recognition, CNN can train and learn a large number of databases under its weight sharing feature. At the same time, under the action of the first two layers of the hidden layer, the complexity of the whole CNN network will be greatly reduced with this feature [14]. In the convolution layer, the convolution operation can make the characteristics of the initial speech more obvious, and it can also have a certain effect on the noise reduction of the speech. When processing the spectral features of the continuous speech, the pooling layer can Perform random sampling processing on it, reduce the dimension under a certain size, do not lose its effective information and save it well [15]. However, in the above research, there is no good solution to the problem of pragmatic failure in cross-cultural communication. Therefore, this paper puts forward the following innovations:(1)The end-to-end deep convolution neural network model is designed and optimized. CTC-CNN model belongs to shallow structure and is not ideal for speech training [16]. Therefore, this paper further designs a new end-to-end deep convolutional neural network acoustic model using the residual block structure; at the same time, the model is optimized by the maxout activation function and finally a new improved CTC-DCNN model is proposed [17]. The experimental results show that the model is better for Chinese speech recognition.(2)Massive training data and super-large model parameters of convolutional neural network hinder the efficient training of the model. In order to improve the efficiency of training parameters and the accuracy of model identification, a back propagation algorithm with reduced weight range is proposed. In the later stage of model training, the seed nodes with minimum approximation error are obtained by K-means algorithm. During iterative calculation, the boundary value rules are used to reduce the oscillation, which makes the network error of training results converge quickly, thus improving the training efficiency.

The chapters of this paper are arranged as follows: the first chapter is the introduction, which discusses the background and significance of the topic selection, and expounds the innovation of the article. The second chapter of this paper mainly combines domestic and foreign research results in the field of intelligent analysis strategies for pragmatic failures in cross-cultural communication based on convolutional neural networks, and proposes innovative results and research ideas of this paper. The third chapter of this paper is the method part, which deeply discusses the application and principle of related algorithms, and based on the previous research results, combined with the innovation of this paper, it proposes the optimization and application model of the intelligent analysis strategy of pragmatic failure in cross-cultural communication. The fourth chapter of this paper mainly discusses the experimental part of the algorithm application. Through the experimental results, on the basis of sorting out the data, an optimization model is established. The fifth chapter is the conclusion, which summarizes the research results of this paper.

LV, Wang, Wei, and others believe that in the main research direction, they also focus on some simple processing, such as processing some small words and some simple isolated words. Moreover, for the above problem processing, they mainly use the initial template matching to carry out the experiment, extract the speech features, and use it to match the experimental speech. The result is the nearest sample between the two. Through this means, we can get a good effect on the processing of some isolated words, but there are great defects in continuous speech, which does not fundamentally solve the problem of recognition [18]. Huang, Qian, and Zhu proposed to expand the research scope to the morphology of language. This kind of structural linguistics is helpful for revealing the internal organization rules of the language system and the interaction and influence between the various subsystems that make up the language system. an indelible contribution. It regards language as a system of symbols, and takes the structure of various elements in the synchronic system of language and their interrelations as the research object. But ignoring the semantic research, there is a tendency to only ask about the rules and not the content. This structuralist research cuts off the inevitable connection between language and society [19]. The research of Wang, Lin, Chang, and others shows that the booming of sociolinguistics and pragmatics has made language research break through the barrier of “studying language for language’s sake.” It is no longer limited to studying the language structure and vocabulary forms of culture, but more extensive and in-depth study of language use and various language behaviors, which makes people pay attention to the use of language signs at the same time instead of only the structure of language signs [20]. Liang, Liao, Hu re Nei when the speaker uses a sentence with correct symbolic relationship in verbal communication, but his speech is inappropriate, or his way of speaking is inappropriate, and his expression is not customary. Specifically, the speaker unconsciously violates interpersonal norms, social conventions, or time and space, regardless of the identity, status, and occasion of both parties, and violates the unique cultural values of the target language, Make the communication behavior interrupt or fail, make the language communication encounter obstacles, and lead to the communication can not achieve the expected effect or achieve the perfect communication effect. Errors of this nature are called pragmatic failure [21]. Wang, Zang, Zhang et al. proposed that in general, according to the number of modalities of sentiment analysis research targets, sentiment analysis can be divided into two categories: unimodal sentiment analysis and multimodal sentiment analysis. Single-modal sentiment analysis considers one modality information and builds a sentiment analysis model for that modality. Single-modal sentiment analysis mainly includes text sentiment analysis, image sentiment analysis, and speech sentiment analysis [22]. Xue, Zhu, Zou, and others think that, different from traditional sentiment analysis methods, the method based on deep learning can not only solve the problem that the method based on manual features restricts expert knowledge but also the problem that the method based on shallow learning has limited ability to express complex functions and deal with complex tasks and can also improve data processing efficiency and reduce analysis costs [23]. Xu, Qin, Hua put forward that the birth of sociolinguistics and pragmatics meets the needs of this kind of research. Sociolinguists believe that language originates from human society and is a unique phenomenon of human society. Language is inseparable from human social conditions. Sociolinguistics attaches great importance to the communicative function of language and the relationship between language and social culture [24]. The research of Kuang, Davison takes into account the pronunciation field of nonspecific people, the emergence of HMM is a good solution to this problem, and the emergence of this theory can be regarded as an inflection point in the history of the entire speech recognition technology. During this period, some more optimized theories were also proposed, among which the classic one is vector quantization, which converts the corresponding acoustic models into vectors one by one, and calculates the difference between each vector and the template vector through the Euclidean distance comparison method [25]. Ahmed, Gogate, and Tahir thinks that speech recognition has turned from HMM to neural network model. Compared with the traditional GMM-HMM, the advantages of neural network model are more obvious, and the correlation features between speech samples are fully characterized. The neural network can train and combine the rough speech features layer by layer until they are suitable for the ideal features of pattern classification [26]. Donghan, Sebastian, Matson et al. and others proposed a method to deal with noisy speech in view of the wide variety of noise and high noise burst in different scenes, which comprehensively adopts the short-time energy method and the cosine angle value of autocorrelation function. Because the change trend of short-time energy characteristics of speech and noise with time is opposite to that of cosine angle value characteristics of autocorrelation function, Therefore, the speech part under endpoint detection of cosine angle value of autocorrelation function is obtained [27]. Nongmeikapam et al. added additional data sets by the means of data set expansion, which was well verified on handwritten digital picture data sets by using convolutional neural networks and improved the experimental performance when the data sets were insufficient [28]. Lv et al. obtained three new network structures by modifying the activation function, learning rate, and changing the number of filters in the original network structure, namely CNN1-1, CNN1-2, and CNN1-3 [18]. Akhtar et al. use a large amount of unlabeled audio data to learn features by using deep convolution belief network, and apply the learned features to specific speech recognition tasks and music recognition tasks [29]. Huang et al. used two deep learning models to learn the speech and visual features, respectively, and used a deep noise reduction self-encoder to learn the audio data features with noise and obtained the audio data features without noise [19].

Previous experiments have proved the effectiveness of convolutional neural network model in deep learning in entity relationship extraction, but there are often some problems such as slow convergence speed and difficulty in fitting nonlinear problems. On the basis of the above-mentioned related research, the positive effect of the intelligent analysis strategy of pragmatic failure in cross-cultural communication based on convolutional neural network is determined. A new algorithm for intelligent analysis strategy of pragmatic failures in cross-cultural communication based on convolutional neural networks is constructed, and pragmatic failures in cross-cultural communication are used to conduct in-depth analysis and research so as to make more effective use of data and mine hidden data behind feature data value and find out the real connotation of pragmatic errors in cross-cultural communication.

3. Methodology

3.1. Research on Related Theories
3.1.1. Cultural and Cross-Cultural Communication

At present, the word “culture” is widely used in the society. Radio, television, newspapers, and magazines often use culture. In many cases, culture seems to be abused, so long as the problems related to human society are crowned with culture. The radiation scope of culture is very wide. Culture is almost ubiquitous, omnipresent, and inclusive. Sociology and anthropology usually use the concept of culture in a broad sense. Some scholars believe that the scope of culture is wide, but it actually includes three levels ① material culture. It is manifested through various physical products made by people, including buildings, clothing, food, supplies, tools, utensils, and more. ② System and custom culture. It is manifested through social norms and codes of conduct that people abide by together, including systems, regulations, corresponding facilities, and customs. ③ Spiritual culture. It is expressed through the ways and products formed by people’s thinking activities, including values, ways of thinking, aesthetic taste, moral sentiment, and religious belief, as well as achievements and products in philosophy, science, literature, and art. Language is a part of culture and language system is a subsystem of culture system. However, the cultural system cannot exist independently of the language system and vice versa. Language is not only a part of culture but also a medium for spreading culture. This dual nature of language determines the inseparability of language and culture. Figure 1 below shows the basic model of general language generation.

By language presupposition, we mean the way of giving illocutionary force in language. Language is an established and learnable symbol system organized according to certain rules, which is used to represent people’s understanding of the objective world in a certain geographical or cultural group. Each culture leaves traces in its own language symbols. Objects, events, experiences, and feelings have different marks or names in different language communities. Voice as the material shell of language is a common element of any language, and the physical, physiological, and social attributes of voice are also universal. In terms of the difference in its pronunciation, if the understanding of the phonetic characteristics of the target language is insufficient in cross-cultural communication, and the pronunciation principles and methods cannot be correctly grasped, it is easy to cause pragmatic errors in phonetics. All wealth created by human beings belongs to the category of culture. Some scholars have divided the category of culture. From the perspective of information theory, pragmatic failure is caused by two communicators with different cultural backgrounds who cannot completely share two sets of completely different semantic potential systems. Although the language used in communication is the same, the information exchange cannot be carried out in the way expected by the other party, resulting in the deviation of information transmission, which inevitably leads to semantic displacement. There are three cases of semantic displacement: ① the meaning of information is improperly increased; ② the meaning of information is artificially reduced; and ③ the meaning of information is misrecognized.

3.1.2. Research on Language Use Based on Convolutional Neural Network

Convolutional neural network is inspired by the study of biological visual cortex, and it is also one of the representative algorithms of deep learning. It can reduce network parameters and complexity by convolution kernel sharing weights and can improve network training speed by parallel operation. In addition, convolutional neural networks have some invariance of displacement, scale, and deformation, which mainly come from the idea of local receptive field, temporal, or spatial subsampling. Artificial neural networks are generally composed of three parts: neurons, topology, and training and learning algorithms. Neuron simply refers to the processing unit between data and the node between each data. The data is output through a series of weighted summation and linear processing. There are multiple options for the input and output of data between each node. Multiple inputs and single output are possible. When each data node is controlled by the threshold, its prominent feature is the nonlinear function. The neuron structure is shown in Figure 2.

Commonly used activation functions include modified linear unit, sigmoid function, hyperbolic tangent function, and softmax function. Among them, softmax function is often used in the output layer. The specific calculation formula is as follows:

The pooling layer prevents the model from over fitting by compressing the data and parameters, improves the generalization ability of the model, and maintains some invariance of the input characteristics. The common mechanisms include the mean pooling, maximum pooling, random pooling, and combination pooling.

In deep convolutional neural networks, as the number of network layers increases, gradient explosion, gradient disappearance, and network degradation are prone to occur. Among them, the phenomenon of gradient explosion and gradient disappearance refers to the gradient multiplication in the process of chain derivation, which makes the gradient too large or too small. Network degradation refers to that the value of loss function decreases gradually with the increase of network layers and then tends to saturation. However, when the number of network layers continues to increase, the loss function will begin to increase.

3.2. Algorithms under Convolutional Neural Networks

Blank, that is, an empty node, is introduced in CTC to realize automatic optimization of the output sequence, so that multiple paths are mapped to the same label sequence. Assuming that the speech frame length of the input CTC structure is , the probability corresponding to the corresponding path can be obtained after CTC is as follows:

In CTC, the sum of probabilities of multiple paths is expressed as follows:

For a single label, the forward and backward algorithm is introduced into CTC. The forward vector can be set to represent the forward probability value corresponding to the node at time, where . Then it is as follows:

Among them, is the probability that the output at time is . The convolutional neural network designed in this paper is shown in Figure 3.

In the early stage of weight training, because the initialized weights cannot meet the requirements of the network, and there is much room for adjustment, the learning rate is faster, and the network error decreases exponentially, while the option value training is slower after training. The network training period can be judged by the training time and error changes. K-means algorithm is used to find out the range of possible error minimum. In the middle and late stages of training, the IVLBP algorithm is used to learn the network according to the results of the weight changes calculated by the recent iterations, to find the initial seed node that approximates the minimum error value, and at the same time, the weight results of the iterative calculation are recorded. In convolution neural network, the back propagation process of parameter training is divided into convolution layer propagation and sampling layer propagation. If the next layer of convolution layer is sampling layer, then the next layer of sampling layer is convolution layer, so the residual calculation of convolution layer is one-to-one nonoverlapping sampling. The residual calculation of characteristic map of layer is shown in the following formula:

The layer is the convolution layer, and the layer is the lower sampling layer. refers to the size of the layer expanded into the layer through weight replication. At this time, maxout can be used for processing and improvement. The weight and offset value of the convolution layer can be adjusted through the calculation of the residual. The following is the calculation of the derivative of the offset value in the convolution layer and the derivative of the offset parameter in the convolution kernel.Where is a matrix formed by all related elements connected by in the layer.

By designing the related parameters of the CTC-CNN acoustic model, we need to train them to get the corresponding acoustic model. In the training process of CTC-CNN acoustic model, it is first necessary to determine the network parameters. Among them, we also selected the CNN acoustic model to compare with the system model. This paper refers to the CNN model in related papers as the baseline system. The structural parameters are: 2 convolutional layers + pooling layers +2 fully connected layers, as shown in Table 1.

In the training process of convolutional neural network, the experimental data are collected in a quiet indoor environment, and the spectrogram obtained by preprocessing the speech waveform is used as the input of the network. The traditional BP algorithm, IVLNP algorithm, and NWBP algorithm are used to learn the weights, and the convergence time and fitting degree of the training errors are compared to judge the advantages and disadvantages of the algorithms. The experimental data of convergence time and mean square error under different algorithms are shown in Tables 2 and 3.

The data in the table reflect that the optimized traditional algorithm has more iterations in the traditional three-layer neural network than the CNN algorithm, which does not well reflect the ability of the new algorithm to effectively approach the mean square error to the minimum value, but increases the training time and reduces the training efficiency, but the training accuracy and iterations are better than those of the traditional algorithm. Among the two kinds of neural networks with different degrees of complexity, the optimized algorithm has better training effect in the complex neural network. For the massive training data of speech recognition system and the super large-scale model parameters of convolutional neural network, the complex neural network can use the optimized algorithm to reduce the sensitivity of parameter change and avoid the network oscillation caused by parameter change, so that the mean square error cannot approach the minimum.

4. Result Analysis and Discussion

This paper uses TensorFlow to design and build several Chinese speech recognition systems based on CNN, CTC-CNN, CTC-DCNN, and other models, and completes 3 sets of experiments. 40 medical undergraduates of Grade 2022 and 20 foreign students of Grade 2022 in our university were investigated, and a comparative study of English and Chinese pragmatic failures was made from three aspects: pragmatic failures, social pragmatic failures, and nonverbal pragmatic failures, so as to explore the situation of cross-cultural pragmatic failures in English and Chinese, analyze the causes, put forward corresponding improvement suggestions, and relevant measures to be taken, so as to promote the improvement of cross-cultural communication ability and further promote the construction of international communication ability. Through the data quantization processing of 40 samples, the three groups of experiments are obtained. Firstly, the shallow convolutional neural network modeling is analyzed and verified. Then the recognition accuracy of end-to-end depth convolution neural network acoustic modeling proposed in this paper is analyzed. At the same time, the acoustic models are optimized for CTC-CNN and CTC-DCNN, trained under different iterations and the recognition results are obtained. These sets of comparative experiments are used to verify the superiority of the CTC-DCNN acoustic model improved by maxout in this paper. Finally, the model is preliminarily tested and analyzed in the noisy environment. The experimental results are shown in Figures 4, 5, and 6.

The influence of different iteration times on the model. After the previous model training, it is found that when the number of iterations is reasonable, the speech recognition accuracy under the CTC-DCNN optimized acoustic model is the highest. In order to verify whether there are more appropriate iterations, this paper retrained CNN model, DCNN model, CTC-CNN model, CTC-DCNN model, and maxout improved CTC-DCNN model to verify the recognition effect under different iterations. Through the above experimental research, it is found that the algorithm designed in the recognition rate has an overall improvement of 59.8%. For cases with different iterations, it also has a good processing effect, especially in the efficiency of result acquisition, which basically remains at 61.8%, which also ensures the role of convolutional neural network in the intelligent analysis of pragmatic errors in cross-cultural communication. The analysis of the accuracy of pragmatic failure analysis in the interference environment is an experimental reference with practical significance. Considering all kinds of interference in practice, it is meaningful to ensure the recognition rate of 58.7% in the algorithm structure of this paper, which will greatly ensure the orderly operation.

5. Conclusions

“Cross-cultural communication is an art, which requires not only professional knowledge and sufficient knowledge base but also a certain depth of mastery of Chinese and Western culture, especially in social customs and conventional expression.” High pragmatic competence should first have a strong sense of context and take cognitive contextual competence as an important means to avoid pragmatic failure. Language rules are the minimum conditions for the use of language to occur. With this foundation, legitimate sentences can be generated and understood and language use can be carried out. However, whether language communication can be successful or how successful it is depends on the communicator. There is a close relationship between the mastery and understanding of the target language culture. CNN has two characteristics, one is the sharing of weights between neurons, the other is the local connection between networks. In recent years, with the success of convolutional neural networks in the field of image and speech recognition, the application of convolutional neural networks has gone deeper and deeper into the field of scientific research. Through convolutional neural network, 40 medical undergraduates of grade 2022 and 20 international students of grade 2022 in our university are investigated. After experimental analysis, it is found that the overall recognition rate of the designed algorithm is improved by 59.8%. Especially in the efficiency of obtaining results, it is basically maintained at 61.8%. In the algorithm structure of this paper, it is very meaningful to ensure the recognition rate of 58.7%, which will greatly ensure the orderly progress of the actual operation.

Data Availability

The data used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was funded by the Social Science Foundation of Qiqihar Medical University in 2022 (Project Number: QYSKL2022-04ZD).