Abstract

Accompaniment production is one of the most important elements in music work, and chord arrangement is the key link of accompaniment production, which usually requires more musical talent and profound music theory knowledge to be competent. In this article, the machine learning model is used to replace manual accompaniment chords’ arrangement, and an automatic computer means is provided to complete and assist accompaniment chords’ arrangement. Also, through music feature extraction, automatic chord label construction, and model construction and training, the whole system finally has the ability of automatic accompaniment chord arrangement for the main melody. Based on the research of automatic chord label construction method and the characteristics of MIDI data format, a chord analysis method based on interval difference is proposed to construct chord labels of the whole track and realize the construction of automatic chord labels. In this study, the hidden Markov model is constructed according to the chord types, in which the input features are the improved theme PCP features proposed in this paper, and the input labels are the label data set constructed by the automated method proposed in this paper. After the training is completed, the PCP features of the theme to be predicted and improved are input to generate the accompaniment chords of the final arrangement. Through PCP features and template-matching model, the system designed in this paper improves the matching accuracy of the generated chords compared with that generated by the traditional method.

1. Introduction

With the increasingly vigorous development of the modern Internet, music has new media and carriers, and more and more music products are derived. Digital music has been better popularized and spread in the Internet information flow carrier, which greatly enriches people's spare time life. The development of intelligent, Internet, virtual reality, and other technologies has blurred the boundary between the real world and the virtual world, allowing paintings, art, and music to be presented to people in a highly genuine form. With the improvement in computer performance and the diversification of Internet functions and products, the threshold of learning music has been greatly reduced. People no longer need rich music theory knowledge and deep musical literacy to engage in music-related industries, such as music creation, music adaptation, and music retrieval. In the field of computer, more and more scholars and experts try to solve and simplify some problems related to music learning and creation by combining music theory knowledge with audio signal processing and researching specific features and algorithms. Artificial intelligence, such as deep learning and machine learning, covers all walks of life and extends to music, which is also developing in the direction of intelligence. Computers begin to assist or even replace professional workers to complete music work [1].

Computer arranger process is by computer algorithm, looking for a set of suitable system for the whole period of melody chord, a pop music usually combined by two parts of the vocal and instrumental music accompaniment; melody is a series of single notes to form a continuous music, and they constitute the theme of the music. Therefore, people create or recreate a popular music, often from the beginning of the creative part of the main melody. Orchestrating harmonious chords for melodic lines can be a daunting task for amateur music lovers. For those who are interested in music creation, it is of great practical significance to study the automatic music accompaniment of relevant computers. For those who are interested in music creation, it is of great practical significance to study the automatic music accompaniment of relevant computers. Because the chord tension and foil music accompaniment in creating music plays an important role in the emotional aspects, the theme automation music accompaniment system will generate a matching chord accompaniment, Finally, a complete music file containing melody and chord accompaniment is the output. The music with automatic accompaniment generated by relevant computer algorithms can be used for entertainment and can also be used for music creators through certain theoretical reference. In the process of automatic accompaniment of music, the part of accompaniment is completely completed by the computer. By inputting the main melody, the creator can get a complete new music work with chord accompaniment, and use computer composition and accompaniment to enrich and expand the research field of computer algorithm. Arrangers can provide a variety of possibilities for the creation of music forms and styles. To a certain extent, the study of automatic music accompaniment system enriches the innovation of music and also provides music creators with reference to music accompaniment chords.

Most musicians think music itself is extremely emotional, subjective, audio, a form of art, many segments of the rhythm of the music from the composer, fragmentary, and the creation inspiration of discontinuity; so for the inspiration of fragmentation and randomness, it is difficult to by a certain fixed computer algorithms to replicated and created again, So, it is more difficult to use computers to help us compose music, but as more and more computer algorithms are introduced into the field of music composition, through hidden Markov algorithm, stochastic process, genetic algorithm, artificial neural network, and so on, algorithmic composition is easier to apply to the current music form. It can be done through the computer simulation in the world with all kinds of music styles and forms [2]. It makes music more accessible to people who are interested in music creation but lack relevant music knowledge, eliminates the barriers of music creation, and makes seemingly distant music creation close at hand.

The harmony of music is the core of accompaniment. To match a harmonious accompaniment for any given melody, it is necessary to solve the coordination problem of automatic accompaniment [3], which leads to another automatic accompaniment system that can match the harmonious chords of the input melody. Lee and Marsic put forward a kind of automatic accompaniment system suitable for a particular style; they constructed a system using new Riemann to transform a chord melody of process based on the MIDI list of paths, including alternative chord path in a similar binary tree structure, and then by a Markov chain with learning probability statistical optimization matching probability of chord. Emilia Gomez put the influence of harmonic frequency into the feature statement in the process of studying PCP features, considered the maximum value of specific frequency, and constrained the normalization of the feature weight of related frequency bands [4]. The improved HPCP characteristics reduce the influence of intensity and different timbre to some extent. Yang et al. has produced a software that can convert an arbitrary input audio signal into a chord sequence corresponding to the harmonious accompaniment [5]. In the process of studying PCP features, Wu et al. also included the influence of harmonic frequency into feature description, considered the maximum value of specific frequency, constrained and normalized the feature weight of related frequency bands, and improved HPCP features reduced the influence of intensity and different timbers to a certain extent [6]. By introducing the maximum likelihood criterion decision tree algorithm, Xue et al. calculated the likelihood coefficients between all single notes and calculated the occurrence times of adjacent intervals at different times. The chord sequence obtained from the combination of the single note with the most occurrence times, and the largest likelihood coefficient was taken as the final matching result [7]. Therefore, solving the automatic arrangement of music chords has become a hot research direction of computer at the present stage.

2. Theoretical Knowledge of Musical Models

2.1. Music Theory

Rhythm is the music of different lengths of the sound, according to a certain law of the combination of musical forms. Rhythm is in the beat, and the rhythm cannot be separated from the beat. The beat is a cyclical occurrence of a rhythm with a rule of strength and weakness [8]. Beats are expressed in fractional form in musical notation. Melody is the soul of music. The high and low of notes, the speed of rhythm, and the strength and weakness make the melody present different colours. Different pitches are connected to form the pitch contour of the melody, which abstracts into a curved melody curve. The distance between different points on the curve represents the interval relationship between pitches. In general, the basic patterns of melody can be summarized as horizontal progression, upward progression, downward progression, and wave progression. Melody is the basis of forming a part. Monophonic music has only one melody, whereas multipart music contains multiple melodies, which revolve around a certain main melody, and each melody is independent and interacts with each other. Generally speaking, the progression of two-part melody can be divided into simultaneous progression, parallel progression, reverse progression, and oblique progression.

Tone is the law of music, which normalizes the relationship between musical sounds through an artificial constraint, so that it presents a form of expression in line with human aesthetics and cognition. At present, there are three main ways of expression of temperament: pure temperament, five degrees of mutual generation temperament, and twelve-equal temperament. The pure fifth of interval relation is the key element. On the premise of determining the pitch, the interval relation is taken as the pure fifth, that is, the conditional constraint of the frequency ratio 3 : 2, and the remaining tone values are deduced [9]. The tone relation obtained in this way is the reciprocal fifth. The characteristic of purity, from the point of view of signal processing, is the frequency ratio of each tone level, identical to a certain integer. According to the relationship of pure temperament, the overall harmony of the tone level is very high, and it is comfortable and three-dimensional from the perspective of human hearing experience. Therefore, in modern applications, pure temperament is generally used in symphony performance, especially in the case of multipart and multi-instrument ensemble, which has a good harmony.

2.2. Fundamentals of Music Signal Analysis

Musical Instrument Digital Interface is one of the most common structured symbolic representations. The contents of an MIDI file are a series of instructions that define what the Instrument will play and when. Because no audio waveforms are stored, MIDI files take up little storage space, and the stored contents can be modified flexibly; these characteristics make MIDI widely used in music creation, music recording, music analysis, and other aspects. Music notation is a kind of musical notation, including two types of music notation for recording pitch and fingering. Among them, the simplified score and staff score belong to the score of recording pitch, whereas the six-line score used for guitar performance belongs to the score of recording fingering.

STFT is a steady-state analysis of signals based on the assumption that the signals are stable in a short time. Therefore, piano music can be assumed to have short-term stationarily and analysed by STFT. The definition of STFT is shown aswhere x represents discrete music signal, stands for window function, and X represents the spectrum at time m. In the process of STFT, the length of the window determines the time resolution and frequency resolution. The longer the window length, the longer the intercepted signal, the lower the time resolution, and the higher the frequency resolution; conversely, the shorter the window length, the shorter the intercepted signal, the higher the time resolution, and the lower the frequency resolution [810]. If the stationary analysis fails, the signal length is recalculated, and the number of signal columns when the source signal is divided into columns is calculated according to the signal length, window length, and the number of signal columns when the source signal is divided into columns. Therefore, in STFT, the time resolution and frequency resolution are contradictory, and the window length should be determined according to the actual situation.

Constant Q transformation (CQT) is another method of frequency domain analysis, and its definition is shown aswhere k is the sequence number of the spectral line, Q is the quality factor, and its value is equal to the ratio of the centre frequency to the bandwidth. Because the centre frequency is an exponential distribution, Q is a constant, N is the window length of the window function, and is the value ofwhereinto, fs is the sampling frequency, f is the lowest frequency of the music signal, and fk is the frequency value of the KTH spectral line. B is the number of spectral lines within an octave. Because an octave is divided into twelve semitones by the average temperament of twelve, B generally takes a value of 12 or a multiple of 12. Then, the frequency corresponding to each spectral line is exactly one to one with the frequency of the scale.

Because CQT spectrum frequency and scale frequency have the same exponential distribution law, CQT is applied to the analysis and processing of music signals. However, the most important problem of CQT is that the calculation speed is slow. One reason is that, for each spectral line number k, the corresponding window length should be calculated and then the calculation should be carried out in accordance with formula (2), resulting in a large amount of overall calculation. The other reason is that the spectral line frequency distribution is not linear. So you cannot call the Fast Fourier Transform (FFT) directly, which slows down the calculation speed. In addition, according to the experimental results, short-time Fourier transform is the most suitable for analysing audio signals.

2.3. Neural Network

Neural network is an operation model whose basic unit is neuron. In a neural network, neurons are connected with weights, and the function of such interconnections is to transmit and activate information [11]. xi represents the input signal, and represents the weight of each input signal and the connection between the neuron. Formula (4) can be obtained through the weighted summation of the input signals based on these weights:where b represents the offset term, and then takes Z as the input to obtain equation (5) through a nonlinear activation function:where y(z) represents the activation function, the nonlinear function is usually selected as the activation function, whose function is to introduce nonlinearity into the neural network, so that the neural network has the ability to solve the nonlinear mapping problem. The most commonly used activation function is tanh function, which is defined as

Generally, the neural network can have multiple layers, in addition to the input layer and output layer, and other layer is known as the hidden layer; in hiding, each layer contains multiple neurons, and the output is the next layer of neurons in a layer of neurons input; this kind of connection mode constitutes the basic structure of neural network [12] and is also the foundation of the network information transmission. The specific structure of neuron is shown in Figure 1.

In Figure 1, except for the input layer, the neurons of each layer are connected with the neurons of the previous layer, and each connection carries a weight value. With the progression of the number of layers, the output of each layer in the neural network can be expressed as follows:where W represents the first l layer of weighting matrix, X layer represents the first l input, and Z represents the weighted sum of the input and output, and then, we get the output of the first layer nonlinear mapping, and the output of the first l layer will be deemed to have been the first layer of input, so keep moving forward, the forward process is known as prior to transmission.

After the forward propagation of the neural network, a predicted result will be obtained. When the predicted result is different from the actual result, an error will be generated, which can be quantified through the loss function, and the quantified result is called loss. The purpose of training the neural network is to reduce the loss [13]. In the process of loss reduction, it is necessary to start from the last output layer and calculate the weight parameter gradient of each layer in reverse based on the chain rule. This reverse process is called back propagation. Taking a neural network with the number of layers N as an example, its back propagation formula (8) is as follows:

There are many layers in the neural network, and the functions of each layer are different. The basic neural network includes input layer, hidden layer, and output layer, which is similar to the state transition network in HMM. Every neuron in the hidden layer is connected to the previous layer, and each path has a weight value for constraint. Each layer is obtained by the weighted sum of the weight of the neurons of the previous layer and the input value, and it becomes the input value of the next layer after nonlinear mapping. In the recursion process, due to the back propagation algorithm, the obtained partial derivatives will be back propagated to update the weight of each layer and the network parameters. In this way, the repeated learning results in stable parameters and a mature neural network model are obtained.

3. Music Feature Extraction

3.1. Data Preprocessing

Considering that the music has the characteristics of short and stable, the signal is usually divided into frames. At the same time, in order to ensure the smoothness and continuity of the frame interval signals after segmentation, the overlapping segmentation method is adopted to carry out local calculation between the frames. In this article, the source file used for preprocessing the frame segmentation data is the audio data of the main theme in WAV format, and the sampling rate of all audio is set at 44.1 KHz to ensure a unified standard [13, 14]. The processed audio signal is sampled down to 11025 Hz to achieve its normalization. If the overlapping frame information obtained by segmentation does not achieve the desired effect, the overlapping segment segmentation method is used again considering the spectrum energy leakage and sliding window function. Frame segmentation is shown in Figure 2.

3.2. Improved PCP Feature Extraction

The principle of PCP feature calculation is based on the change in frequency value of the twelve-average law in music theory and the mapping calculation. The change of Pitch of different notes in music, in speech signal, is the change in frequency value [15]. It is generally understood that it spans an octave, but the ratio of frequencies belonging to the same tone is 2 : 1. In twelve equal temperament, the frequency of the adjacent chromatic is one over twelve of the 2 to the power relationship; therefore, in the music signal, the change of the transverse grows exponentially, mapping to the three-dimensional space, said can see that the change of pitch corresponds to the frequency change is climbing upward spiral, can see more intuitive way of step frequency change.

The most unique advantage of PCP feature is that its processing makes the spectral energy of the audio signal attached with musical characteristics, so when processing the audio data related to the music signal, the musical characteristics of the audio signal can be better displayed [16, 17]. The setting of the centre frequency is corresponding to the frequency value corresponding to the twelve semitones in the twelve-equal temperament. The weight of the frequency value of all the notes in the twelve-equal temperament is retained, and the weight of the irrelevant frequency value is filtered out. It can effectively overcome the low-frequency noise and high-frequency overtone interference, and at the same time, the weight of the basic frequency in the low-frequency band is retained, so as to overcome the problem of fuzzy sound value to a certain extent (Figure 3).

Figure 3 is the spectrum diagram corresponding to the frequency range of the note where A4 is after Gaussian filtering. It can be observed that 440 Hz has the largest amplitude, that is, its position corresponds to the central frequency, while the left amplitude boundary of other frequencies is between 420 Hz and 430 Hz, and the right amplitude boundary is between 450 Hz and 460 Hz. The calculated frequency values are outside the boundary, so the frequency of effective sound values will not be blocked, which plays a very good filtering effect.

Figure 4 is the PCP feature spectrum diagram improved by Gaussian filtering set and logarithmic compression. The spectral energy in the feature part of pitch level is more coherent, and the corresponding pitch level structure of each time segment can be clearly seen. This part of audio is A melody WAV file of the song Little Star, which I recorded by myself. Through the spectral map obtained by the improved PCP feature extraction method in this article, melody sounds C, G, A, G, F, E, D, and C can be clearly obtained (Figure 4).

4. Design of Chord Arrangement System Based on the HMM Model

4.1. Application of the Accompaniment Hidden Markov Model

A system based on the implicit Markov model of accompaniment to automatically match the optimal chord is constructed for the melody of the main melody. The system takes the existing structured and simplified melodic songs with accompaniment as the sample data of the training of the hidden Markov model. Most of the popular melodies are mixed to learn, and the advantages of various styles are integrated to provide a data reference for the accompaniment arrangement of the input single melody songs. The problem of accompaniment chord selection of single note theme and the optimization of chord sequence are solved by the accompaniment hidden Markov model [1820]. The input melody is segmented, and the input melody mode is unified in different songs and modes, and the single melody song is transformed into the standard C major without changing the internal sound group structure of the melody itself. Therefore, it greatly facilitates the arrangement of chords. To pick up the theme of the characteristics of the fragments and according to the characteristics of the combination of machine learning algorithm to obtain sample songs under the different styles of melody-matching chord by relevant probability, in the accompaniment, chords knowledge database matching choice was made, thus having the right chord of this fragment, and repeat the above steps, until we get the chord accompaniment matching probability, the optimal, and record and update the relevant probability parameters [2123].

The characteristic notes of melody are the weight relationship of the proportion of notes appearing in this piece of music. The notes that appear most frequently in a piece of music are defined as the characteristic notes of this piece of music. When entering the simplified score of the sample music, the simplified score of the input music will be screened, and the characteristic notes will be extracted segment by segment. Match the optimal chord for each characteristic note based on the characteristic note. About the optimization of the single melody notes and sequence, the further design of the composition of the chord internal algorithm, through the chord construction algorithm, can generate a sound, vivid chord structure of a sports trend group, in accordance with the matched code and the best chord sequence obtained by matching combination. Finally combined chord sequence and main melody single notes playing at the same time play a melody with a harmonic accompaniment.

4.2. The Framework of Chord Arrangement System

The automatic chord matching system designed is mainly divided into two parts: one is the music feature extraction part, that is, the improved PCP feature extraction described in the previous literature. The other part is the model part, which includes the collection of model chord labels, model training and prediction.

As shown in Figure 5, the chord automatic matching system is mainly divided into two parts. The dashed frame on the left is the music feature extraction module, which adopts the improved PCP feature. The other module is the model module in the dashed box on the right, which mainly involves the HMM model and the construction of automatic chord labels. The model is to delete a series of musical information by means of symbolic event recording; the channel where the percussion music is located, analyse the musical characteristics of each track, retain the note with the lowest pitch, delete the other notes, and get the accompaniment track. The data set of accompaniment tracks is stored in the form of event messages, and the source data is in MIDI format. It is very convenient to extract and collect music-related indicators (Figure 5).

4.3. Automated Chord Tag Construction

Different from the common WAV format audio signal storage form, MIDI stores a series of music information in a file in the form of message of event by means of symbolic event recording. Therefore, it is very convenient to use this format as the source data to extract and collect music-related index characteristics. This article uses the Accompaniment Track portion of this file as the source data set for the automatic chord construction tag construction, so the following will describe how to get the Accompaniment Track and its MIDI music information.

Different from the common WAV format audio signal storage form, MIDI stores a series of music information in a file in the form of message of event by means of symbolic event recording. Therefore, it is very convenient to use this format as the source data to extract and collect music-related index characteristics [18]. This article uses the Accompaniment Track portion of this file as the source data set for the automatic chord construction tag construction, so the following will describe how to get the Accompaniment Track and its MIDI music information (Figure 6).

As shown in Figure 6, it is a schematic diagram of the high-pitched contour line under a time series. The horizontal axis represents time. In different time segments, there are different notes, and each note corresponds to the pitch of the vertical axis. Skyline algorithm is to extract the notes with the highest contour line as the main melody notes, namely, the red highlighted part in the figure, on the premise of multitone overlap. The collection of these high-pitched contour notes can form the main melody channel sound track.

In this article, the input melody is segmented, and the input melody mode is unified in different songs and modes, and the single melody song is uniformly transformed into the standard C major without changing the internal sound group structure of the melody itself. Therefore, it greatly facilitates the arrangement of chords. Fragments of theme features are extracted, according to the characteristics of the combination of machine learning algorithm to obtain sample songs under the different styles of melody-matching chord by relevant probability, and in the accompaniment, chords knowledge database matching options, this segment of the right chord, and repeat the above steps, until get the chord accompaniment matching probability, the optimal And record and update the relevant probability parameters [19, 20]. Melody characteristic ratio is in this period of music notes weight relations. The notes that occur most frequently in a piece of music are defined as the characteristic notes of that piece. In the input sample music chords, the chords of the entered the music selection, piecewise characteristics extracted note, based on the characteristics of the corresponding to match each characteristic notes of a chord in optimal. The aim is to obtain the estimation of transition matrix probability Aij, observation matrix probability, and initial state probability I in the hidden Markov model of music accompaniment through machine training learning. The following is the definition of each probability in the hidden Markov model of music.

The probability of transition matrix is estimated by

The probability of observation matrix is estimated by

According to the melody characteristic tone and accompaniment sequence obtained above, the parameters of the accompaniment hidden Markov model are updated and statistics are performed to obtain the training results of the corresponding sample songs, such as the probability of state transition matrix, the probability of emission matrix, and the probability of initial matrix. In the algorithm of automatic accompaniment chord system of music based on Hidden Markov model, the intermediate state transition probability of each moment is obtained from the intermediate state transition probability of the previous step, which is a recursive calculation method. Chord prediction is a decoding problem, which is to solve the optimal path in the state transition network to maximize the probability of the corresponding path. Based on the premise that the corresponding system has known the PCP characteristics of the main melody of the observation sequence and the parameters of the hidden Markov model, the accompaniment chord sequence that is most likely to correspond to the main melody is obtained. Here, it is defined as

Formula (11) describes a mathematical solution to the decoding, which represents the maximum probability value of all selector subsets reaching the state at the time point with known parameters of HMM. According to this equation, the optimal solution at the next moment can be obtained as

Finally, the optimal solution can be obtained, and it stays in the final state as

The optimal path is to recourse forward and obtain equation (14) by constantly solving the following equation:

Finally, the set of all subsets constitutes an optimal chord selection path.

4.4. Experiment and Result Analysis

The improved theme PCP feature vector proposed is used as the feature extraction of the model and as the observation vector of HMM. The number of model states is set to 6. Except for the initial and termination states, all the others are active states. Each activity state uses a single Gaussian observation function, a diagonal matrix, consisting of an average vector and a change vector. After the model training is completed, 5 files are randomly selected from the test data set as test objects, and the improved PCP feature vectors are extracted and input to the model for chord prediction, and the chord sequences obtained are recorded, as shown in Figure 7.

As can be seen from the comparison results in the figure above, compared with the original traditional PCP features, the improved PCP features used in this article have improved the accuracy of chord arrangement to a certain extent. The experimental results obtained using the improved PCP features proposed in this article. The accuracy of chord arrangement In Vacation, Better Hurry Up, and Holiday Door Time increased by 6.65%, 6.58%, and 6.14%, respectively, while in Cool Hun Day and Better Door Us, the accuracy increased by 2.89% and 3.01%, respectively. In general, the improved PCP feature proposed in this article has better chord arrangement effect compared with the traditional PCP feature.

5. Conclusion

MIDI music data set, based on hidden Markov model chord recognition model, combined with improved PCP music features as the input vector, cooperate with the chord label the method of building automation, set up a complete set of chords orchestration system, provides a way of computer automation to as a theme for orchestration of accompaniment chords to help people solve the needs of musical accompaniment and chord arrangement. The proposed automatic chord label construction is based on MIDI data format, so it may be difficult to recover 100% of the constructed chord sequence. In this article, a method of automatic chord label construction is proposed. Based on the characteristics of MIDI symbol data format, a method of chord analysis based on interval difference is proposed, and the accompaniment chords of the bar are obtained by matching with the binary chord template constructed in advance. Finally, the automatic chord label construction is realized. This article designs a set of chord arrangement system based on HMM hidden Markov model, elaborates the mathematical principles and technical points of the hidden Markov model in detail, and expands and explains each step and process of the model combined with the improved PCP characteristics of the main theme. In the training process, the observation vector of the input value of the model is the PCP feature vector improved, and the label of the model is extracted from the training data using the method of automatic chord label construction. After the model training is completed, the improved PCP features of the theme to be predicted in the test set are extracted and input to the HMM model for prediction to generate the final arranged chords. Compared with the traditional PCP feature and template-matching model, it is found that the improved PCP feature and the HMM model proposed have better chord matching effect and higher accuracy. Although the system built in this study successfully realizes the automatic chord arrangement and has a better effect on chord arrangement than the previous methods, there is still much room for improvement. In music theory, the composition of chords is not only the broken chords of single notes strung together but also changes according to the needs of the song itself.

Data Availability

The data used to support the findings of the study are included within the article.

Conflicts of Interest

The authors declare that they have no conflicts of interest.