Abstract
In order to improve the intelligent analysis effect of Bel Canto music and improve the singing skills of Bel Canto, this paper uses the machine semantic recognition algorithm to identify the characteristics of Bel Canto music, builds a Bel Canto music feature recognition model, and analyzes the advantages and disadvantages of each Bel Canto semantic feature clustering method and the applicable data set types. Finally, this paper applies the Bel Canto semantic feature clustering method to multicomponent signal parameter estimation, uses the clustering method to cluster the time-frequency analysis of nonstationary signals, obtains the time-frequency distribution of each signal component, and then estimates the parameters of each single-component signal. The experimental results show that the Bel Canto music feature recognition method based on machine semantics proposed in this paper has a good effect.
1. Introduction
Bel Canto comes from the Italian Belcanto, which is a singing method of vocal skills. Influenced by Western monophonic and polyphonic music, Bel Canto has been gradually created and developed. It has a beautiful timbre and can be freely retracted and sung, and it has a lot of coloratura features when it is sung. Bel Canto has the characteristics of combining rigidity and softness with flexible voice. The biggest difference between it and national vocal music is the way of vocalization. Bel Canto voices have a lower larynx, the sound has a clear metallic texture, and the songs delivered are fuller and brighter.
Music creation and songs have been comprehensively developed, especially with the increase in art schools, and Bel Canto has been deeply integrated into education and teaching, which greatly enriched the teaching content. We affirm that the introduction of Bel Canto has promoted the development of the national vocal music education system to find further richness and science. However, at the same time, we have to admit that there are also some problems in the development of Bel Canto teaching in China; that is, the audience of Bel Canto is relatively low. The main reason for this phenomenon is that people’s thinking is more traditional and conservative. According to the survey, many people still believe that Bel Canto is a foreign culture. The first people who came into contact with Bel Canto were vocal artists with relatively open minds.
Most of the people who like Bel Canto are educators, singers, or students who are engaged in Bel Canto teaching. Although the development of Bel Canto teaching in Chinese colleges and universities is becoming more and more professional, systematic, and common, it also limits its dissemination range and public acceptance. The second is the lack of innovation. Like national vocal music, many students study Bel Canto because of their interest and love. With the excessive emphasis on skills, although the basic singing method of Bel Canto can be mastered, without major collision and innovation, Bel Canto will remain on the surface and cannot be effectively developed. The third is that Bel Canto’s performance is not easy to understand. Whether it is nationality or Bel Canto, they all use a scientific voice system, but what they pursue is not the same. Bel Canto pays more attention to the training and cultivation of the basics of vocalization and makes standardized arrangements for vocalization skills, emphasizing sound resonance and penetrating power; ethnic vocal music emphasizes natural vocalization, and the words are eloquent. In the traditional aesthetics of China, it is obvious that the latter is easier to be accepted and understood. Moreover, compared with ethnic groups, Bel Canto requires more professionalism and technology and requires a lot of long-term practice. In addition, since many people do not understand foreign languages and literature, students often use rote learning when faced with a Bel Canto work. This fixed-mode teaching method will gradually make Bel Canto lose its true existence and significance. The connotation of singing affects the public’s aesthetics.
It is difficult for the communication modes of musical language and written language to be completely integrated because interpretation must restate the content and find its equivalent [1]. “In some cultural spheres, paraphrasing is an act of liberation, a means of revision, of revaluation, of escape from a dead past. In others, it is reactionary, reckless, timid suffocating” [2]. Literature [3] opposes interpretation, and its fundamental objection is to oppose that works of art have connotations, requiring people to focus on the work of art itself, accept its impact, and revive its tension. The author disagrees with this point of view. Literature [4] believes that musical works, whether classical or modern, embody human’s intuitive wisdom and understanding of the world and have rich connotations. However, the connotation of musical works is different from other art forms in that it is nonsemantic, that is, linguistically ineffable [5].
Reference [6] proposed a matching model to calculate the music-text emotional semantic relevance of music clips and words. There are two differences between music-text emotional semantic relevance and word similarity. The similarity between words is based on the same symbol system, while music and text belong to two different symbol systems [7]. Words contain not only emotional semantics but also entity-oriented semantics. Therefore, word similarity reflects multidimensional semantic matching results. However, the emotional semantic relevance of music-text only reflects the matching degree of emotional semantics [8]. The difference between the music-text emotional semantic relevance and the direct matching degree of music content lies in the following: (1) The latter only focuses on the consistency of the melody but does not reflect the emotional semantics behind the music. This makes the direct matching of music content more suitable for precise retrieval of music but cannot support fuzzy retrieval based on emotion. (2) Even with the music of the same type, the same style, and the same emotion, the music melody is still quite different [9]. Therefore, the direct matching method of music content is difficult to apply to music recommendations. Based on the semantic analysis of the content and words of music clips, this paper quantitatively describes the semantic relevance of music and words from the emotional point of view. The relevance degree can be used in the fields of music emotional semantic-oriented retrieval system, automatic music labeling, and music recommendation. Experimental results show that the matching model can effectively match music and words from an emotional perspective [10].
Text semantic matching establishes semantic associations between text units of different granularities through rules or training to support applications targeting text semantic information, such as information retrieval [11], machine translation [12], and dialogue systems [13]. Common semantic matching models include deep semantic structure model [14], convolution deep semantic structure model [15], and convolutional tensor neural network [16]. The deep semantic structure model mainly models the matching degree between query items and documents. Compared with the traditional text matching model, this method has a significant improvement. The deep semantic structure model is a typical siamese network structure. Each text object is vectorized by a 5-layer network. Finally, the cosine similarity of the two text vectors is calculated to determine the similarity of the two texts [17]. Compared with the deep semantic structure model, the convolutional deep semantic structure model replaces the fully connected layer that generates the sentence vector in the middle with the convolutional layer of the convolutional neural network, and the other structures are the same as the deep semantic structure model [18]. The convolutional tensor neural network uses a ranking-based loss function, which aims to widen the gap between the positive and negative samples, and does not care about the absolute value of the matching degree. The above semantic matching model is mainly used in text matching, and music and text belong to two different symbol systems, so they cannot be directly used in the scene of semantic matching between music and text [19].
This paper uses a machine semantic recognition algorithm to perform feature recognition of Bel Canto music, builds a feature recognition model of Bel Canto music, and carries out systematic model analysis combined with experiments.
2. Machine Semantic Recognition Algorithms
2.1. Time-Frequency Distribution Analysis and Parameter Estimation of Bel Canto Semantics Based on Clustering Method
For the Wegener transform time-frequency distribution of multicomponent Bel Canto semantics composed of chirp Bel Canto semantics, the energy distribution of each component is relatively concentrated. Based on this, a manifold clustering method is designed to separate multicomponent Bel Canto semantics. Since the energy distribution of each component Bel Canto semantics is manifold in the three-dimensional space of time, frequency, and energy, the spectral clustering method based on LSA (local subspace similarity) and the generalized principal component analysis (GPCA) method are used to cluster the Wegener transform time-frequency distribution of multicomponent Bel Canto semantics, respectively. The energy distribution of each component Bel Canto semantics is grouped into one category and labeled so as to obtain the time-frequency distribution of each component Bel Canto semantics, and then the parameters of component Bel Canto semantics are estimated to obtain the parameters of each component Bel Canto semantics.
The application of chirp Bel Canto semantics is very extensive. Such Bel Canto semantics are often used in pulse compression radar. The mathematical model of chirp Bel Canto semantics is
Among them, represents the pulse start time of Bel Canto semantics, B represents the pulse bandwidth of Bel Canto semantics, T is the pulse duration of Bel Canto semantics, and is the carrier frequency of Bel Canto semantics. The frequency of chirp Bel Canto semantics varies linearly with time during the pulse duration.
The application of fixed-frequency Bel Canto semantics is also very extensive, which can be understood as a special case when the modulation frequency of LM Bel Canto semantics is 0. The mathematical model of fixed-frequency pulse Bel Canto semantics is
Among them, represents the carrier frequency, represents the initial phase, and A(t) represents the amplitude of Bel Canto semantics.
2.2. Introduction of Common Time-Frequency Analysis Methods
The method of short-time Fourier transform to deal with nonstationary Bel Canto semantics is to divide Bel Canto semantics into several small time intervals to analyze nonstationary Bel Canto semantics. It assumes that the Bel Canto semantics is stationary in a small time interval and then uses Fourier transform to analyze the Bel Canto semantics of each short-time interval to determine the frequency of Bel Canto semantics in each time interval. These spectrums can macroscopically reflect the temporal changes of the spectrum. We use a window function h(t) with a small time length to represent the window time length and set the window sliding. Then, the short-time Fourier transform of nonstationary Bel Canto semantics y(t) can be expressed as
Among them, represents the complex conjugate operation. That is to say, the short-time Fourier transform of nonstationary Bel Canto semantics y(u) at time t is expressed as y(u) multiplied by a window function h(u−t) centered on t, and then the Fourier transform is performed. This is to select a time segment at each moment to analyze the Bel Canto semantics. Therefore, the principle of short-time Fourier transform is to analyze the local time spectrum of the local vicinity at each moment. The short-time Fourier transform has the excellent properties of time shift invariance and frequency shift invariance.
The wavelet transform uses a set of basic functions to approximate the original Bel Canto semantics, and the basic functions can be obtained by scaling or shifting the basic wavelet functions at different scales. Basis functions have some good characteristics, such as good time-frequency aggregation and small time-bandwidth product. We assume that the wavelet basis function is , and the calculation method of the basis function of the wavelet transform is
Among them, a and b represent the scaling factor and the translation factor, respectively. If the value of a is large, the basis function of the wavelet transform becomes a wider preimage wavelet, which is a low-frequency function. If the value of a is small, the basis function of wavelet transform becomes a narrower preimage wavelet, which is a high-frequency function.
The principle of wavelet transform is to use a basis function with limited bandwidth and time width to replace the trigonometric function in the Fourier transform and then obtain a feature transform in the time domain and time-frequency joint domain. Although this method sometimes has the advantage of frequency domain localization, it comes at the cost of reduced frequency resolution. The wavelet transform of high-frequency basis functions has lower frequency resolution but higher time resolution. The wavelet transform of the low-frequency basis function has higher frequency resolution but lower time resolution. The time-frequency analysis method is more suitable for nonstationary Bel Canto semantic analysis where the Bel Canto semantic transformation is slow when the frequency is low.
Time-frequency analysis methods include linear time-frequency transformation and nonlinear time-frequency transformation. The short-time Fourier transform and wavelet transform mentioned above are both linear time-frequency transforms. Therefore, the instantaneous power spectral density of the Bel Canto semantics of the vocal source cannot be well represented. However, the Wegener time-frequency distribution is a nonlinear time-frequency transformation, which can represent the instantaneous power spectral density of the Bel Canto semantics of the sound source and can be used for the distribution of the energy of the Bel Canto semantics of the sound source in the time domain and frequency domain.
The Wegener distribution of the Bel Canto semantics y(t) of the vocal source in a certain continuous time is expressed as
Among them, represents the window time length, z(t) is the analytic Bel Canto semantics of the source Bel Canto semantics y(t), and the analytic Bel Canto semantics of the vocal source Bel Canto semantics y(t) is
Among them, H[y(t)] represents the Hilbert transform of the Bel Canto semantics y(t) of the vocal source.
The principle of Wegener’s time-frequency analysis is the short-time Fourier transform with an adaptive window function, where the window function is the Bel Canto semantics of the sound source itself. The physical significance of Wegener’s time-frequency analysis is that the distribution of its energy in the time and frequency domains can be displayed. The Wegener distribution has good mathematical properties and some properties, such as displacement, edge properties, and real number properties, which make the Wegener time-frequency analysis widely used. However, when the Wigner distribution processes multicomponent Bel Canto semantics, overlapping terms between each component Bel Canto semantics will be generated, which will seriously affect the quality and resolution of the time-frequency distribution.
The time spectrogram of the Bel Canto semantics of the utterance source is expressed as the two-dimensional convolution of the Wegener distribution between the Bel Canto semantics of the utterance source and the moving window, and the formula is
Among them, is the Wigner distribution of the window function . In the formula, the sum of is the energy value of any point (t, f) in the time-frequency distribution plane, and the principle is to averagely weight the Wegener distribution values of the points near (t, f). That is to say, the spectrogram of the Bel Canto semantics of the utterance source is the energy average value of the Bel Canto semantics of the utterance source in the vicinity of any point (t, f) as the center. This operation diffuses the components of the Bel Canto semantics of the utterance source but reduces the overlapping terms of the Bel Canto semantics of the utterance source. The rearrangement spectrum method redefines the rearrangement operator, recalculates the energy distribution of the Bel Canto semantics of the sound source in the time-frequency domain, and distributes each average value to the energy center of gravity of each point and its nearby points, thereby improving the energy concentration of the Bel Canto semantics of the vocal source component. The rearrangement operator is defined as is the new coordinate of (t, f) in the original time-frequency domain. The value of the rearranged spectrum at any point of the Bel Canto semantics of the vocal source is equal to the sum of all spectrogram values at that point, and the formula is
2.3. Multicomponent Bel Canto Semantic Parameter Estimation Based on Clustering
The time-frequency sample point distribution of multicomponent Bel Canto semantics is clustered by the clustering method to obtain the clustering results. The algorithm labels the sample points, then draws the time-frequency sample point distribution of the component Bel Canto semantics of several categories, and then obtains the parameters and instantaneous frequencies of the corresponding component Bel Canto semantics through the time-frequency distribution analysis method. For example, the component parameters of a multicomponent Bel Canto semantics are shown in Table 1.
The time-domain waveform of multicomponent Bel Canto semantics is shown in Figure 1. Since it is a multicomponent Bel Canto semantics that is formed by mixing two linear FM Bel Canto semantic components, its expression is

The variation law of the amplitude of Bel Canto semantics with time is uncertain, and it is difficult to analyze and estimate the parameters of Bel Canto semantics from the time domain. The distribution of Bel Canto semantics in the time-frequency domain is obtained by transformation, and then peak sampling is performed at each time point to obtain the time-frequency sample point distribution of multiple component Bel Canto semantics. The sample points are clustered by the linear manifold clustering method, and the sample points with the same component Bel Canto semantics will be clustered into one class.
The Wegener transform time-frequency sample point distribution of multicomponent Bel Canto semantics is shown in Figure 2. It can be seen that there are two Bel Canto semantic components in the figure. Since the modulation method of Bel Canto semantic components is linear frequency modulation, the frequency changes linearly with time, and the time-frequency sample point distribution of each component is on the same linear manifold.

The time-frequency sample point distribution is clustered by the spectral clustering method based on LSA, and the Wegener transform time-frequency sample point distribution of its Bel Canto semantic components is obtained respectively, as shown in Figure 3.

(a)

(b)
Then, the algorithm performs polynomial fitting by detecting the peaks of time-frequency sample point distribution points to obtain the corresponding Bel Canto semantic component parameters, as shown in Table 2.
However, when the sampling frequency is too large, there are too many sample points in the time-frequency distribution, which leads to the slow calculation of the clustering algorithm, or the program cannot be executed due to the memory limitation of MATLAB itself. Aiming at this problem, the main sample point extraction method is given, then the extracted main sample points are clustered, and then the Bel Canto semantics are sorted. For example, the expression for a multicomponent Bel Canto semantics is
Its parameters are shown in Table 3, and Figure 4 shows the time-frequency distribution of the Wegener transform of the multicomponent Bel Canto semantics with three Bel Canto semantic components. It can be seen from the figure that, for the three Bel Canto semantic components, in general, each time point corresponds to three frequency peaks, or it may be less than three peaks. If the frequency peaks of the two Bel Canto semantics overlap, the main sample points are obtained by extracting the frequency peaks for each time point.

As shown in the frequency-energy diagram corresponding to time point 1 in Figure 5, it can be seen that there are many peaks in the change of energy with frequency. Among them, the first three peak points with the largest energy are the energy peak points corresponding to the three Bel Canto semantic components and also correspond to their instantaneous frequencies. The energy of the first three peaks in the figure is [64.78, 62.82, 42.21], and the corresponding frequency is [0.3013, 0.2437, 0.057]. Then, for one time point, only three or fewer sample points can be extracted. In this way, the number of samples is greatly reduced, and the samples are representative. On the premise of not affecting the clustering results, the number of samples is reduced, and the efficiency of the algorithm is improved.

After the time-frequency distribution points are clustered, the time-frequency sample point distribution of each Bel Canto semantic component is drawn separately, and its parameter estimation is obtained. The results are shown in Table 4.
3. Semantic Recognition of Bel Canto Music
The Bel Canto music labeling system needs to preprocess the audio files and labeling results of each track in the dataset to obtain the corresponding audio feature vector and semantic feature vector. After that, these data are input into the generative adversarial network for training to capture the mapping relationship between the two features, as shown in Figure 6.

The new model involves key technologies such as ontology construction, mapping establishment, and query processing. Its implementation steps are as follows. It first extracts the relational pattern in the local data source, constructs the local ontology according to the transformation rules, and at the same time constructs the global ontology in the global environment. Then, it establishes a two-level mapping between the global ontology and the local ontology and between the local ontology and the data source (called global mapping and local mapping). The pattern extraction model is shown in Figure 7.

Combined with the integration model in Figure 7, this paper designs an ontology-based Bel Canto music data integration system, OMDIS, whose architecture is divided into three layers: application layer, middle layer, and data layer. The application layer is the interface between the system and the user, consisting of browsers or applications. The middle layer is responsible for shielding the distribution and heterogeneity of data sources at a high level, providing users with data with a transparent and unified interface, and is composed of components such as local agents, ontology managers, global agents, and query processors. The data layer is used to store Bel Canto music data, is responsible for accessing and obtaining the data, and consists of heterogeneous data sources and corresponding wrappers. The architecture of OMDIS is shown in Figure 8.

The algorithm in this paper firstly establishes the Bel Canto music dictionary, then represents the Bel Canto music in a matrix, and finally conducts lexical correlation analysis. The overall technical scheme of the algorithm is shown in Figure 9.

The semantic recognition effect of Bel Canto music obtained above is evaluated, and the results shown in Table 5 and Figure 10 are finally obtained.

It can be seen from the above research that the Bel Canto music feature recognition method based on machine semantics proposed in this paper has a good effect.
4. Conclusion
Bel Canto is a high-level singing method with high requirements in terms of tone, breathing, and resonance. Different Bel Canto works are based on different historical backgrounds, social cultures, and character psychology. Therefore, when singing, the singer must master the background information of the work, understand the profound meaning of the work, and place himself in it to integrate with the artistic conception of the work. When Bel Canto singers interpret their works, the most direct thing is the power of sound, and the control and progression of singing emotions are transmitted through sound, which requires singers to have the ability to use singing skills flexibly. In this paper, the machine semantic recognition algorithm is used to identify the characteristics of Bel Canto music, a model of Bel Canto music feature recognition is constructed, and the system model analysis is combined with experiments. The experimental research shows that the recognition method of Bel Canto music features based on machine semantics proposed in this paper has a good effect.
Data Availability
The labeled dataset used to support the findings of this study is available from the corresponding author upon request.
Conflicts of Interest
The author declares that there are no conflicts of interest.
Acknowledgments
This study was sponsored by North University of China.