Abstract
In the transmission of medical images, if the image is not processed, it is very likely to leak data and personal privacy, resulting in unpredictable consequences. Traditional encryption algorithms have limited ability to deal with complex data. The chaotic system is characterized by randomness and ergodicity, which has advantages over traditional encryption algorithms in image encryption processing. A novel V-net convolutional neural network (CNN) based on four-dimensional hyperchaotic system for medical image encryption is presented in this study. Firstly, the plaintext medical images are processed into 4D hyperchaotic sequence images, including image segmentation, chaotic system processing, and pseudorandom sequence generation. Then, V-net CNN is used to train chaotic sequences to eliminate the periodicity of chaotic sequences. Finally, the chaotic sequence image is diffused to change the raw image pixel to realize the encryption processing. Simulation test analysis demonstrates that the proposed algorithm has better effect, robustness, and plaintext sensitivity.
1. Introduction
At present, there are two main ways for information transmission, one is text and the other is image. Therefore, in addition to words, images also contain a lot of important and confidential information. In the current era of computer network, images are mostly stored in the form of digital images, which is simple, quick, and easy to find. However, at the same time, it also increases the risk of information leakage, especially when images are transmitted on the network; they are easy to attack. In this context, image information encryption is an important means to prevent information leakage [1–5].
There are many research studies on image encryption protection. The traditional encryption algorithms mainly consist of randomly disturbing the row or column of image information to encrypt, randomly disturbing the image pixel information for encryption and decryption, zooming in and out the image information of the pixel point, and so on [6–8]. They are easy to crack. To solve the problems in the above methods, chaos-based encryption algorithm appears, which is the most widely used image encryption algorithm at present [9, 10]. Although it has higher encryption effect, this method has two defects. One is that all the image information is encrypted into ciphertext image, resulting in a sharp increase in the amount of information after image encryption, occupying a large amount of storage space. Secondly, the generated chaotic sequence by pure use of the chaotic system shows local linearity and strong correlation, that is, it will show a certain degree of periodicity and so on. The existence of this feature makes the image security relatively lower [11–15].
In this article, aiming at the periodicity shortcoming of chaotic encryption algorithm, V-net CNN is used to learn chaotic sequence to break the periodicity of chaotic sequence to improve the confidentiality of image encryption. The validity and practicability of the new scheme are proved by testing, which provides a reference for image encryption.
2. 4D Hyperchaotic System
The 4D hyperchaotic system [16, 17] studied in this paper is as
When parameters = 35, = 3, = 33, and = 8, a typical hyperchaotic attractor exists in system (1). The phase diagram is shown in Figure 1. Figure 1(a) is the x-y-z three-dimensional projection phase diagram. Figure 1(b) is the y-z- three-dimensional projection phase diagram.

(a)

(b)
2.1. Analysis of Chaos Characteristics
The dissipative property of the new system (1) is analyzed. Dissipation value is ; when , the system is wasteful. If the system parameters are substituted, ; if the dissipation condition is satisfied, the trajectory of the system eventually contracts asymptotically to a particular limit set of zero volume at an exponential rate and is eventually fixed to an attractor.
Four Lyapunov exponents are obtained, LE1 = 0.343, LE2 = 0.052 2, LE3 = −0.305, and LE4 = −36.640, of which two Lyapunov exponents are greater than zero, that is, system (1) is a hyperchaotic system.
2.2. Stability Analysis
Adding the time-delay term to the second nonlinear formula of hyperchaotic system (1), the time-delay model equation is shown as
When the hysteresis term = 0, system (1) is locally asymptotically stable at , and the Jacobi matrix is
The feature equation is as follows:
According to the substitution law, , , and , if only the virtual root is considered, when = 0, the characteristic equation of system (1) is
According to Routh–Hurwitz criterion, if and , then the real parts of the characteristic roots of equation (3) are all negative. By substituting corresponding parameters into the above inequalities, it can be seen that the time-delay system (1) is locally asymptotically stable at .
3. Proposed Image Encryption
Computer technology is developing day by day; image storage is mostly realized in the form of the digital image. The image has many information, especially in some special fields (national defense, military, finance and personal privacy, etc.), the information in the image is confidential and not allowed to be disclosed. So, how to ensure the safety of image information is very important. Image information encryption is the main solution at present [18]. Among them, chaotic encryption is the most commonly used method. Its principle is to superimpose one or more chaotic signals on the useful signals to be transmitted at the sending end so that the signals in the transmission channel have the shape of random noise and then achieve the purpose of encryption and secure communication. This method has high encryption speed, lossless compression, and high security, but the generated chaotic sequence still shows a certain degree of periodicity. This paper improves and optimizes a chaotic sequence image encryption algorithm based on V-Net CNN. The proposed method is shown in Figure 2.

3.1. Chaotic Sequence Generation
Chaotic sequence generation is the first step in image encryption, which aims to transform plaintext image into random sequence. The specific process includes three parts: image segmentation, chaotic system processing, and pseudorandom sequence generation.
3.1.1. Plaintext Image Segmentation
The function of plaintext image segmentation is convenient for chaotic system processing. In general, the monitoring image of the target sequence is divided into any one of the ten different sizes in Table 1. The size of each round block is determined by the session key used during the encryption of that particular round.
3.1.2. Chaotic System Processing
A chaotic system is used to generate real number sequence for plaintext image subblock. There are five commonly used chaotic systems, namely, logistic chaotic system, Chebyshev chaotic system, Skew Tent chaotic system, Henon chaotic system, and Lorenz chaotic system [19–23]. The description of the above five chaotic systems is as follows:(A)Logistic chaotic system: here .(B)Chebyshev chaotic system: where , and is the system control parameter. When , the Chebyshev map enters the chaotic state:(C)Skew Tent chaotic system: When , the system is chaos.(D)Henon chaotic system: where and are the system control parameters; when and , the system is in the chaos state.(E)Lorenz chaotic system:
3.1.3. Pseudorandom Sequence Generation
After chaotic system processing, the sequence generated is real number sequence, which also needs to be converted into pseudorandom sequence, namely, chaotic sequence. There are three generation methods for pseudorandom sequence, namely, threshold method, binary sequence method, and quantitative extraction method [24]. The following is a specific analysis.(A)Threshold methoddefine a threshold function as Its complement is . is the set threshold. is the value of chaotic sequence. Equation (11) is applied to the real number sequence to obtain the pseudorandom sequence.(B)Binary sequence method: the chaotic sequence value can be written as the binary form of can be expressed as So, it can get a pseudorandom sequence.(C)Quantitative extraction method: if the obtained chaotic sequence is not in the range of [0, 1], the chaotic sequence is normalized to the interval [0, 1] to obtain . In the representation of as a binary number, it takes the lowest or middle N bits as required. The binary bits corresponding to each value are combined to obtain the key sequence used for encryption.
3.2. V-Net Convolutional Neural Networks Eliminating the Periodicity of Chaotic Sequences
3D V-Net full convolutional neural network [25, 26] is used in this paper. 3D convolutional neural network can convolve 32 layers of medical images at the same time. Besides learning image features, the 3D convolutional neural network can also learn the position change information of images between different layers. 3D convolutional neural network is a network model with the huge parameter system. In order to make the model perform better, the overall flow of V-Net CNN is shown in Figure 3.

In the process of downsampling, the high-level feature map contains semantic category information, while the low-level feature map retains image details. In the process of downsampling, convolutional neural network will lose important category information. As the downsampling process goes on, the image gradient gradually disappears. To preserve the semantic information of high-level images, the convolution results of high-level images are sent to the upsampling process through the connection layer. However, the complete upsampling process undoubtedly increases the training difficulty. In this paper, the feature maps in the lower sampling process are connected to the upper sampling process by multiplying certain weight values through the global average weight module. The specific approach is to first average pool the feature maps output by the first four layers in the downsampling process and then calculate the corresponding weight values by Softmax function. The formula for calculating the weight is as follows:where represents the convolution output result of the ith layer. The purpose of global average pooling (GAP) is to eliminate the influence of different scales on weight values in the process of downsampling. Global average weight (GAW) module is adopted to effectively utilize multiscale feature information to improve the learning efficiency of deep learning. The weight acquisition process is shown in Figure 4.

Level set (LS) loss function is a loss function based on the level set method [27], which is the first application of the level set method in loss function of deep learning network. LS loss is denoted aswhere , , and is the fixed value parameter. is the whole image area. is the level set function. and represent the curve length and area regularization terms, respectively. is the pixel at (x,y) in the image. H is a differentiable step function, where is a hyperparameter used to improve the gradient of the function, which is set to 2.5 in the experiment:
The idea of LS loss is to first use step function to set all the inside edges of the outer wall of prediction results and ground truth to 1 and the outside edges to 0. When calculating the loss, multiply the predicted result and the ground truth and then sum to calculate the loss and perform the same operation after taking the reverse. The purpose of this is to give enough weight to the edges. This loss function is suitable for the segmentation of single outer edge objects, but not for the segmentation of medical objects with both inner and outer edges. Based on the level set, we propose a regularized level set loss function (LSR Loss), which can optimize the edge through LS loss and constrain the internal details of gastric wall through regularization, giving full play to the respective advantages of the level set method and the deep learning method. LSR loss is defined aswhere represents the entire image region, represents the pixel value in ground truth, and represents the pixel value of the image predicted by the network. Here,
When the neural network predicted value and the corresponding position of the image were more accurate, the values of and would be closer to 1. Then, the difference between the GT and the predicted value will be equivalent to taking the opposite, and then, it will be close to 0 when multiplied by corresponding to the predicted value. However, when the image boundary error, the loss value will be very large. Therefore, and are added in this paper to constrain the size of the loss function to normalize it. Step 1: input the plaintext images generated in the previous into the V-net convolutional neural network structure as training samples. Step 2: the first layer is conducted convolution operation on the plaintext image, namely, the weighted sum. Step 3: downsampling the plaintext image after convolution operation, that is, pooling. Step 4: repeat Step 2 and Step 3 to extract key features of plaintext images and reduce the amount of data processing. Step 5: enter the full connection layer and connect all key features of plaintext images together. Step 6: output the training results, and judge whether the error between the convolution result and the actual output is less than the set threshold. If the result is less than the threshold value, the V-net convolutional neural network training has been completed. If it is greater than the threshold value, error backpropagation is required to adjust the thresholds and weights of each layer until the convolutional neural network training is completed.
3.3. Chaotic Sequence Image Diffusion
After the above processing, the periodicity of chaotic sequence is eliminated. However, its pixel value has not changed, so there are still certain security risks. In this case, chaotic images need to be diffused [28]. Diffusion treatment is as follows.
Formula (19) is used to replace each component of chaotic sequence:where FR, FG, and FB are the RGB components of chaotic sequences. The matrices PX, PY, and PZ are pixel matrices. is XOR operation.
It does the substitution again and changes the pixel value further. A diffusion function needs to be introduced here. The expression of the diffusion function is as follows:where represents the ciphertext of the current pixel point, is the plaintext of the current pixel and the ciphertext of the previous pixel, U represents the maximum value of pixel point, represents XOR operation, and represents a random value.
Again the substitution formula is as follows:
4. Experiments and Analysis
Two images of Lena and Skull with a size of 512 × 512 pixels are selected for simulation and analysis experiment. Setting logistic chaos system = 3.9999, two-dimensional logistic chaos system have = 0.9, = 0.9, and = 0.1. Experimental hardware environment is 64 GB memory, Windows10 operating system environment. The software simulation platform is Matlab 2017a. To evaluate the overall effect of the encryption algorithm, the following security performance analysis is made from the histogram, information entropy, correlation coefficient analysis, robustness, key space and sensitivity, and antidifferential attack. Figure 5 is the original image. Figures 5(a) and 5(b) are Skull image and Lena image, respectively. Figure 5 is from this paper Vaseghi et al. 2021 (Under the Creative Commons Attribution License/Public Domain) [29].

(a)

(b)
4.1. Image Gray Histogram
Generally speaking, the histogram distribution is relatively uniform, which can effectively prevent attackers from analyzing the histogram to obtain plaintext information. Figures 6(a)–6(d) show the plaintext image of Lena, ciphertext image of Lena, skull plaintext image, and skull ciphertext image, respectively. It is observed that the distribution of ciphertext histogram is uniform. Therefore, this new algorithm can resist histogram analysis attacks and conceal the statistical characteristics of plaintext images.

(a)

(b)

(c)

(d)
4.2. Information Entropy
The information entropy is mainly used to measure the uncertainty or randomness of information source. Conversely, chaos has a high entropy. It is reflected in the image; the more uniform distribution of the pixel gray value denotes the higher entropy value and the stronger randomness. For 8-bit images, the entropy value should be as close as possible to the ideal value 8. The calculation of information entropy is as follows:where is the occurrence frequency of pixel value in the ciphertext image R. Table 2 is the comparison of global information entropy with other methods including FRFT [29], ASFS [30], and CENN [31].
However, there are some deficiencies in the global information entropy and the measurement of the image before and after encryption is not accurate. Therefore, based on the global information entropy, Lin et al. [3] proposed a more rigorous local information entropy test. The core idea is to randomly select nonoverlapping subblocks in the target image, represented as . Each subblock contains pixels, and then, calculate the global information entropy of each subblock. The local information entropy of the image is
It selects k = 30 and = 1936 to carry out local information test on the gray image. Through the local information entropy test, this algorithm is compared. Table 3 shows that the local information entropy test of the proposed algorithm has a relatively high pass rate.
4.3. Correlation Coefficient Analysis
If its value is close to 0, the correlation between image pixels is weaker. If its value is close to 1, the pixels are more relevant. The lower correlation coefficient denotes that it can better avoid the attacker obtaining the meaningful information from the ciphertext image [5].
The correlation coefficient is calculated by the following formula:
The test results of correlation coefficients between test images and ciphertext are shown in Tables 4 and 5. It can be observed that the adjacent pixels of the plaintext test image have a strong correlation, while the adjacent pixels of the ciphertext image basically have no correlation.
Figure 7 and 8 show the plaintext and ciphertext images of Lena and Skull in three directions. As can be seen from the figures, the relationship between the adjacent pixels of the plaintext image in each direction is linear, while the relationship between the adjacent pixels of the ciphertext image is relatively discrete, with basically no correlation and good encryption effect.

(a)

(b)

(c)

(d)

(e)

(f)

(a)

(b)

(c)

(d)

(e)

(f)
4.4. Robustness Analysis
With the rapid development of computer and password cracking technology, attackers can intercept ciphertext images and add or modify them to attack ciphertext and images, causing interference to decryption. A new encryption algorithm should have strong robustness after encrypting the plaintext image and be able to resist various attacks and decrypt successfully. Clipping attack, noise attack, and JPEG compression are carried out on the ciphertext image. Figures 9(a)–9(c) are clipping 25%, Gaussian noise mean square error 50, JPEG compression, clipped 25% decryption, Gaussian noise decryption, and JPEG compression decryption, respectively. We also conduct data loss and noise attack for Skull image. Figure 10 is the data cut with 64 × 64. Figure 11 is the 4% salt and pepper noise. Figures 12 and 13 are the corresponding decrypted image for Figures 10 and 11, respectively. The results show that our proposed encryption method has good robustness.

(a)

(b)

(c)

(d)

(e)

(f)

(a)

(b)

(c)

(d)

(a)

(b)

(c)

(d)


4.5. Key Space and Sensitivity
Large key space can resist key exhaustive blasting effectively. According to [5], only when the key space is greater than or equal to 2100 can it better provide reliable security guarantee for the algorithm. The proposed algorithm uses seven groups of keys, and each group of keys has a floating point accuracy of 1016. Therefore, the key space is (1016)7 = 10112, which is larger than 2100, so it can resist the explosive attack.
For different keys, the decrypted image should not contain any information about the plaintext image, which requires the encryption algorithm to be sensitive to the key. The sensitivity of the key is tested below. Minor changes are made to one of the 7 groups of keys , other keys remain unchanged, and the plaintext image is compared with the mean square error. R is the ciphertext image to be compared:
As shown in Figure 14 and Table 5, Figure 14(a) is the unchanged image and Figures 14(b)–14(h) are the image decrypted with the wrong key including (b) , (c) , (d) , (e) , (f) , (g) , and (h) . By comparing with the image decrypted correctly in Figure 14(a), it can be seen that the plaintext image cannot be recovered and the information related to plaintext cannot be obtained after minor changes in the key. It can be observed from Table 6 that the mean square error values are above 0.8, which is almost the same as the mean square error values of ciphertext images and plaintext images, and the entropy values are also above 7.99 (close to 8), which proves that the image decrypted with the error key is very different from the plaintext image and further indicates that the new algorithm in this paper has a high key sensitivity.

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)
4.6. Differential Attack Resistance
If the method is more sensitive to plaintext, it is more resistant to differential attacks. The sensitivity to plaintext can be measured by the indexes NPCR [32, 33] and UACI [34, 35]. When there is only one pixel difference between two plaintext images, the ciphertext image obtained after encryption changes greatly. Let the pixel of point in the two ciphertext images be and ; then, NPCR and UACI are calculated as
N = 99.6094070 and U = 33.4635070 are the expected values of the two indicators. In this study, one hundred groups of Lena images are selected for testing, and each group contains the image with one bit value changed. We take the average values of the two indicators, and the test results are shown in Table 7. The obtained NPCR and UACI by the proposed algorithm are closer to the ideal value. And the algorithm is highly sensitive to plaintext, which can effectively resist differential attack and selective plaintext attack.
5. Conclusion
In summary, with the rapid development of computer network, images are mostly presented in the form of digital images, which are not only convenient to save but also fast to transmit. However, at the same time, image information is more likely to be leaked and stolen due to the openness of the network. Therefore, V-Net convolutional neural network is used to improve and optimize the general chaotic encryption algorithm, which has a certain degree of periodicity. Simulation results show that the proposed method improves the encryption effect.
Data Availability
The data that support the findings of this study can be obtained from the corresponding author upon request.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
This project was supported by Taif University researchers, supporting project no. TURSP-2020/107, Taif University, Taif, Saudi Arabia. The authors would like to acknowledge the Scientific Research Funds of Education Department of Liaoning Province in 2021 (General Project) (LJKZ1311) for its support, which facilitated the publication of this paper.