Abstract

In the field of computer vision, it is a very challenging task to use artificial intelligence deep learning method to realize the programming and creation of NFT artwork. With the continuous development and improvement of deep learning technology, this task has become a reality. The generative adversarial network model used in deep learning can generate new images based on the extraction and analysis of image data features and has become an important tool for NFT artwork image generation. In order to better realize the NFT artwork programming, this paper analyzes the working principle of the traditional adversarial generation method and then uses the StyleGAN model to edit the higher-level attributes of the image, which can effectively control the generated style and style of the NFT artwork image. Finally, in order to improve the quality of the generated images, this paper introduces a channel attention mechanism and a spatial attention mechanism to ensure that the generated images are more reasonable and realistic. Finally, through a large number of experiments, it is proved that the NFT artwork transmission programming algorithm based on artificial intelligence deep learning proposed in this paper can control the overall style of image generation according to the needs of the transmission, and the generated image features have good details and high visual quality.

1. Introduction

Traditional artwork creation [1] is usually elegant or even noble, and their creative activities [2] are limited to the artist himself [3]. With the rise of digital art [4], many artists are primarily involved in creation [5] and come to the relationship between Ganshan's works and the audience [6]. The public is more often the recipient, and it is difficult to communicate between works and works. The emergence of NFT artwork creation [7] has completely changed the way and experience of artwork creation, making the public a member of real artwork creation [5, 8]. An audience with the ability to create artworks can change the performance elements of artworks at any time, which enables the audience not only to have a dialogue with art from the artworks [9], but also to express specific emotional experiences [10] in specific artworks [11]. In general, the social participation and timely expression of NFT artworks break through the boundaries of the artist and the work itself, enabling the audience to express their emotions from the work, and unprecedentedly expanding the depth and breadth of artistic expression. The collection value of NFT artworks is mainly based on the collector’s personal preferences and fluctuates according to market laws.

With the continuous advancement of science and technology [12], various digital means [13] emerge in an endless stream, and new things continue to emerge [14]. Human artistic creation has entered a new era of multi-source information exchange. The way humans create art is no longer limited to the traditional paper media and passive acceptance [15], and more creative and dissemination methods [16] based on digital platforms have emerged [17]. The way of communication between people has changed from the traditional language and text-based approach to a complex of sounds [18], words [19], images [20], and images presented through digital artworks [21]. This new and comprehensive way of communication has completely changed the relationship between people ways of socializing and exchanging information. Art trading is based on digital trading [22] and has unique rules [23]. Artwork itself has aesthetic value and commercial art value [24], and aesthetic value directly determines the level of its commercial value. Artwork itself is irreproducible and unique. Compared with the value of many artworks, it is also the basis for the value of traditional artworks. In the digital age, many ways of creating artworks have been displayed through digital platforms. In general, the creation of NFT artworks on digital platforms still relies on manual transmission, which is strongly related to personal knowledge and background, and requires complex human and material resources. How to realize intelligent NFT artwork creation based on deep learning method is still a challenging problem.

In order to realize the task of intelligent programming and creation of NFT artworks, this paper introduces deep learning technology in artificial intelligence methods. Using deep learning technology to realize the purpose of programming and creation of NFT artworks can effectively reduce the consumption of a lot of manpower and material resources, and at the same time, it is controlled by programming. The NFT artwork programming can precisely control the synthesis process of a certain style and add the desired style. First, this paper analyzes how traditional adversarial generative methods work. Secondly, this paper adopts the StyleGAN model to edit the higher-level attributes of the image, which can effectively control the style of the generation of NFT artwork images. Finally, in order to improve the quality of the generated images, this paper introduces a channel attention mechanism and a spatial attention mechanism to ensure that the generated images are more reasonable and realistic. Finally, through a large number of experiments, it is proved that the NFT artwork transmission programming algorithm based on artificial intelligence deep learning proposed in this paper can control the overall style of image generation according to the needs of the transmission, and the generated image features have good details and high visual quality. The NFT artwork programming and creation algorithm based on artificial intelligence deep learning proposed in this paper greatly improves the complexity of manual transmission and improves the controllability of generated images. Due to the long time of artistic creation and the great changes in the creative style, it is difficult to improve the efficiency of painting. Therefore, this paper adopts the artificial intelligence method.

2.1. NFT Artwork

NFT artwork has economic value and cultural value. Starting from the creation itself, artwork is the crystallization of the artist’s intellectual labor, which embodies the artist's style and cultural heritage, and its cultural value reflects the artist’s creativity itself. At the same time, artists are required to change the creative process, and they should create artistic creations based on the thinking of the Internet, rather than paper creations. This kind of specificity becomes Internet art, and cultural value is born. We usually think that art is the art that was nurtured by the Internet culture, and it is the product of the times under the development of contemporary technology.

Based on the combination of technology and art, the current NFT artworks have the following characteristics: (1) decentralization: decentralization is the end of storage using the blockchain, and artists and collectors can achieve peer-to-peer transactions, rather than being constrained by middlemen. Compared with traditional collection methods, artists can directly connect with collection groups, which can not only increase the popularity and exposure of artists, but also improve the transparency of the duration of art. (2) Uniqueness: The identification of NFT artworks is unique, and these data are all added with tamper-proof technology. Although each work is independent, all data connections on the same blockchain are related. If one of them is modified, other works will be affected and modified. Only with the consent of at least half of the collectors, the data works on the blockchain can be modified. This method protects the collection value of NFT artworks and makes them naturally scarce and recognizable. Through this kind of reliable trading behavior, it can effectively prevent unscrupulous acts such as selling fakes, embezzlement, and selling, and protect the stability of the art market. (3) Resale right: This kind of rights was originally a measure to protect the interests of artists; that is, every time an artwork is sold, artists can get the right to benefit to protect their creative enthusiasm, but in reality, this system is difficult to implement. Because NFT artworks are based on digital platforms, the ownership and creation rights of items will be applied to all NFT artworks, and experts can get relevant benefits for the circulation of each artwork. This method is more fair, just, and open and provides a good regulatory platform. (4) Division of ownership: Although the NFT artwork itself cannot be divided, its ownership can separate the NFT artwork in the form of currency and can be distributed to different owners and collectors. For collectors, it can expand the scope and quantity of transactions in the form of purchasing tokens, which can be used to purchase complete ownership artworks and hold more ownership of artworks to reduce risks. For the duration of art, it can lower the threshold for collectors, reduce collectors’ capital reserves and market risks, and make artworks more convenient to circulate.

The economic value of NFT artworks is reflected in the technical difference of its financial value. Collectors can also pay more attention to its financial value rather than artistic and cultural value for the collection of NFT artworks. Collectors can pledge or sell NFT artworks through online trading platforms, which can clearly estimate the financial value of artworks. In addition, collectors can realize the ownership of NFT artworks through virtual currency transactions, and if they cannot repay or are overdue, lenders can obtain NFT artworks at low prices. In this way, the organization can obtain a large number of peripheral products at a lower cost and continuously improve the economic value. However, the NFT art collection also has a negative impact. The current NFT art collection market is mixed, and there is great uncertainty in the cultural value and economic value of art, which leads to serious business risks depending on the duration. Most researchers take a cautious attitude. Whether five papers are virtual currency or NFT artworks are tools for making money in the business world, they should be cautious to avoid fraud. For NFT owners with physical art, there is a mechanism that breaks the rules of art. If the owner’s NFT art is directly destroyed, the collector’s NFT art will also undergo similar changes, resulting in collectors’ mistakes in values and value judgments outweigh the gains. Collectors acquire works of art in order to achieve value preservation and appreciation in the future. Such art investment is meaningful. But NFT artwork is similar to virtual currency as a virtual item and is exchanged as equivalent. This type of digital currency has a low-value retention rate, is greatly affected by the market, and has large price fluctuations. And the corresponding NFT’ artworks do not have economic evaluation and related systems, and there are no works that have accumulated profound cultural heritage and reflect the spirit of human civilization as a reference in the market. Using the market for adjustment and the collectors’ psychological speculation to set prices is highly uncertain, and the NFT art collection market also has an unstable and chaotic system. To a certain extent, manual intervention can be reduced, but deep learning is data-driven and requires a huge amount of data for training.

2.2. Image Style Transfer

Traditional image style transfer is mainly carried out from two supervisors, namely, the field of computer graphics and the field of computer vision. The field of computer graphics is mainly the input description of the virtual scene, usually a polygon array; each polygon consists of three vertices, and each vertex includes three-dimensional coordinates, texture coordinates, RGB color, etc. The output is an image, a two-bit pixel array. In computer vision, the input is an image or a sequence of images, usually from a camera or video file. The output is a real-world understanding of the image sequence. The style transfer of image stars is actually nonphotorealistic graphics in the computer field. With the continuous development of deep learning technology, most of the current style transfer methods use neural network-based methods. It is mainly divided into two categories, namely, parametric texture modeling based on statistical distribution and nonparametric texture modeling based on MRF. Style transfer is widely used in image generation in vision and is representative, and it can generate image content with a specific style.

Parameterization methods based on statistical distribution mainly model texture as N-order statistics, while a classic routine of MRF-based methods is to use patch similarity matching for point-by-point synthesis. The texture modeling approach solves a very important problem of image stylization transfer, which is the modeling and extraction of styles in style maps. The style and content are then mixed to form a correspondingly stylized result, mainly by applying another domain image reconstruction. Image reconstruction is mainly divided into two categories: slow image reconstruction methods based on online optimization and fast image reconstruction methods based on offline model optimization. The slow image reconstruction method for online optimization is to do gradient descent in the image pixel space to minimize the objective function. This type of algorithm can be understood as using random noise as the starting image and then iteratively changing the pixel values of the image to find the next target. Since each reconstruction result needs to be iteratively optimized many times in the pixel space, this method is very time-consuming, occupies a large amount of computing resources, and requires a lot of time overhead.

With the continuous development and progress of deep learning technology, the style transfer method based on adversarial generative network has gradually become the mainstream. Strictly speaking, style transfer based on GAN is actually within the scope of style transfer of neural network, because the essence of GAN generative adversarial network is neural network. Due to its exquisite loss function design, the generation effect of GAN is in the image. The field of generative comes out on top, and the research has far-reaching significance. The similarities between these GAN-based neural networks are obvious. The input is noise, the output is an image, and the intermediate neural network is passed through. The network structure of GAN is exquisite, mainly composed of a generator and a discriminator. The generator is responsible for reconstructing the input image, and the reconstructed result is sent to the discriminator together with the real data for judgment. The discriminative network is responsible for distinguishing whether the output of the generator is a real image from the real dataset or a generated image from itself. The generator will try to make the generated pictures fool the discriminator. The method is naturally that the more real the generated pictures, the easier it is to deceive the discriminator; the discriminator will try to distinguish whether the input is real or generated, and its feedback will guide the generator to generate more realistic pictures, thus forming a dynamic game, and the result is that the output pictures are similar in content and style to the real dataset. Since the GAN has no content and style constraints, the generated effect is more like the general process of art. In addition, due to the sophisticated loss function design, the results created by the model are very realistic. At present, there are many methods of image generation, but most of the images generated by the methods are noisy and unreal. Only the StyleGAN series generated better results.

3. Algorithm Research

The creation of NFT artworks is a very meaningful thing in digital artworks, but the transmission of NFT artworks usually requires a lot of manpower and material resources to complete the creation of art and achieve high collection value. With the continuous development and progress of deep learning, the use of deep learning algorithms to implement NFT artwork programming has become a reality. Deep learning has a wide range of applications, including image generation, text generation images, text generation scenarios, style transfer, and content generation. Deep learning algorithms have the ability to automatically learn data style and distribution, and can infer expected output content based on input data. Saving a lot of human and material resources has become the key basis for the programming and creation of NFT artworks. As long as the network model design, training, and dataset are different, the NFT artwork usually generated is different.

Converting real-world or virtual-world images into NFT artworks according to real-world scenes is a very interesting and practical work, which has attracted extensive attention in the field of computer vision. NFT artworks derogately refer to creation as an artistic style that mainly uses computers to stylize the features in the original image, so that the original content or style can be displayed and recognized in the new image. Because this NFT artwork programming creation algorithm is relatively novel, it can allow the computer to automatically learn the content and style for NFT artwork creation, which has attracted the attention of a large number of scientific researchers. The creation of NFT artworks mainly relies on manual creation, which consumes a lot of manpower and material resources, and the creation efficiency is low.

3.1. Traditional Generation Algorithms

The use of deep convolutional neural networks and generative adversarial networks has become the norm in the research on image generation of NFT artworks. These network models can learn and extract a certain type of artistic style and add these artistic styles according to the input image to form a new target image, so that the generated NFT image has a specific artistic style. In the task of image transfer and generation, first, the CNN model is used to learn from images of a specific style, then the generative adversarial network (GAN) is used to add the extracted style to the input image, and finally a generated image with a specific artistic style is formed, such as oil painting, cartoon painting, animation, landscape painting, and other different art types.

Generative adversarial network (GAN) model is a commonly used network structure in the field of image generation. It does not need to express the probability distribution of the samples in the dataset explicitly, but implicitly realizes the inherent distribution and balance of the learning data through the continuous competition and game between the generator and the antagonist by adopting the method of adversarial learning. Through the continuous generation confrontation process, when the generator and the discriminator reach a balanced state, the image generated by the generator can have the inherent characteristics and attributes of the original real data, and the image data generated by the generator can achieve with fake effect.

The generative adversarial network (GAN) model usually consists of two parts, the generator G and the discriminator D. The convolutional neural network (CNN) module is usually used as the basic module in the generator and discriminator structures, which can effectively extract the feature information of the input image. Since the deep learning model has rich parameters and powerful learning ability, the CNN structure is usually embedded in the discriminator to improve the discriminator’s discriminant effect. As shown in the network structure of Figure 1, when using the GAN structure to generate image data, it is first necessary to input a random noise vector z for the generator and obtain the output result of the image through processing operations such as regularization layer, fully connected layer, and convolutional layer. In the network structure, we need to note that the number of stages is mainly determined by the size of the input image, and usually the output image size of the generator is the same as the input image size. Then, the generated image and the label image are input into the discriminator D, which has the same network structure as the generator. Usually the label [0, 1] is used to indicate whether the input image is the same as the label image. The output label of the discriminator for the same image as the sample is 1. Otherwise, the output label of the discriminator D is 0. In the whole process of generative adversarial network, the generator G strives to generate images with the same data distribution as the original data, making the generated images difficult to identify; the discriminator D continuously maximizes the discrimination between the generated image and the label image, as far as possible. It is possible to distinguish the generated image from the original label image. Therefore, in the whole training process, the generator G and the discriminator D are constantly in the game, and the generation ability and discriminative ability of both are continuously improved to new heights with the training process.

The learning purpose of the GAN model can be expressed by the following formula:

A cycle consistency loss similar to CycleGAN is used in the loss function of the model. For image transformations G and F, there arewhere x and y correspond to the input image and target image, respectively. At the same time, the L1 loss function is used as the distance metric between the generated image and the original image:where and refer to the discriminator of the original input image and the target image, respectively, and λ represents the balance parameter set according to experience, which is a hyperparameter in the experiment.

3.2. StyleGAN Generation Method

The traditional image generation method uses the generator G to accept random noise and normalizes it, and then generates images through convolution and upsampling methods. The generation method is simple, and the effect is general. In order to obtain a better image generation effect and to be able to edit specific attributes in the image, this paper uses StyleGAN as the backbone network and uses the mapping network and the generation network to jointly control the image generation process. Global styles are precisely controlled. Adding random noise can be more refined for generating high-fidelity image details.

Due to the phenomenon of coupling in the process of image generation, if there is a high coupling between various attributes, it is difficult for the model to obtain deeper relationships of hidden features. In the actual image generation process, we want to control different subspaces or feature styles more precisely, and the decoupled features are more helpful for data analysis or synthesis. In the method of this paper, we first normalize the hidden vector z by using the mapping network f and perform spatial mapping processing through 8 fully connected layers, so as to obtain the intermediate variable expression . The features of the hidden space can be effectively decoupled through the mapping network, as shown in Figure 2. This paper mainly learns a specific artistic style through a model and then transfers this style to new works.

The image synthesis network G is similar in structure to the traditional GAN, which can synthesize from low-latitude and high-latitude features, and adopts the traditional convolution module structure. An adaptive normalization layer is added in each block module, which receives additional signals A and B, respectively. A represents the feature vector of the intermediate hidden variable after affine transformation, which is mainly used to control the style of image generation. B represents random noise added after convolution to enrich the detailed features of the generated image. Each convolution module can adjust the overall style of the generated image according to the input feature vector A and at the same time can use B to adjust the detailed information generated by the image. The lower-level feature information in the image generation network G can even control the overall outline and shape of the image generation. With the continuous improvement of the feature dimension, the generated image is continuously enriched with detailed information through A and B and finally in a more refined image. Scale controls the microscopic features of an image. The expression of the adaptive normalization layer formula is

AdaIN first needs to normalize each feature x, and then scale and bias the style feature y. The image generation network G refines the local features of the image by adding random noise B. The AdaIN layer receives the style information A of the mapping network, so it can control the style of the image, such as identity, pose, angle, and smile. This paper mainly introduces the StyleGAN to achieve the controllability of the generated image by decoupling the intermediate variable Z.

In addition, the StyleGAN also introduces two measures of spatial decoupling of hidden variables, namely, perceptual path length and linear separability, which can allow the generator to be more linear and decoupled to represent different phonemes. Their calculation formula is as follows:

3.3. Attention Mechanism

In order to improve the quality of image generation and make the generated NFT artwork images more realistic and natural, this paper introduces the attention mechanism module. The attention mechanism module mainly relies on the principle of bionics, simulates the way humans observe objects, pays more attention to the local details of the image, and is widely used in the field of image generation.

For a given feature map , the convolutional attention mechanism can compute a one-dimensional channel attention map and the two-dimensional spatial attention map , and the calculation formula is as follows:

For the input feature map, first use global max pooling and global average pooling to compress in the spatial dimension, then obtain the maximum pooled feature and average pooled feature, and then pass through a shared network composed of multi-layer perceptrons and sigmoid activation. The function obtains the channel attention weights, and the structure of the attention mechanism is shown in Figure 3. Quite different from the channel attention mechanism, the structure of the spatial attention mechanism has been improved with related improvements. The first is to obtain two-dimensional features through global max pooling and average pooling in the channel dimension, then connect the two obtained two-dimensional features, use the convolution operation to generate the spatial attention weight, and finally pass the sigmoid activation function. Get the spatial attention weights. In this paper, channel attention mechanism and spatial attention mechanism are added after convolution, so that the model can pay attention to both channel feature information and spatial location information. By emphasizing meaningful features in channel information and spatial information or suppressing irrelevant features, the generative model pays more attention to the quality of image generation, thereby generating more realistic NFT artwork images.

4. Experimental Results

Figure 4 shows the effect of the synchronization scale factor. The positive value in the plane represents the multiple of expansion, the negative value represents the multiple of reduction, and the ordinate represents the combined effect of these two scales.

Figure 5 shows the effects of various characteristic factors. From the figure, we can see that there are three influencing factors, X, Y, and Z, these three influencing factors have the characteristics of coupling, and there are certain coupling characteristics between any two. There is a problem of feature entanglement.

To achieve image decoupling, we use Figure 6 to illustrate this situation. The two different shapes in Figure 6 represent two different factor features, which are decoupled through a mapping network. Generally speaking, it is a feature factor that controls an aspect of image generation, and decoupling is to reduce the degree of correlation between them.

Figure 7 represents multi-label data in data training. Because there are multiple factors that control the image generation process, the generated images have multiple labels of different types. The method in this paper tries to make the generated image one factor corresponding to one label.

Figure 8 shows the optimal solution of the network model during the training process. In the training process of the model, two factors of different scales work together, so the network model needs to be adjusted to achieve the optimal solution.

Figure 9 represents the confusion matrix for five types of generated styles using traditional methods. The five types correspond to contour, texture, pose, image quality, and expression, respectively. By comparing the generation effect of traditional methods, it can better illustrate the actual effect of the NFT artwork programming method based on artificial intelligence deep learning proposed in this paper.

Figure 10 represents the confusion matrix for ten types of generative styles. In addition to the five types presented above, five additional types are added, namely, appearance, artistic type, size, clarity, and emotion. Through the confusion matrix of the generation style, it shows that the effect generated by the method proposed in this paper has better controllability and has better control characteristics in the future NFT artwork generation process.

5. Summary

In order to achieve better NFT artwork programming, the NFT artwork transmission programming algorithm based on artificial intelligence deep learning proposed in this paper achieves a better artwork creation effect. First, this paper analyzes how traditional adversarial generative methods work. Secondly, this paper adopts the StyleGAN model to edit the higher-level attributes of the image, which can effectively control the style of the generation of NFT artwork images. Finally, in order to improve the quality of the generated images, this paper introduces a channel attention mechanism and a spatial attention mechanism to ensure that the generated images are more reasonable and realistic. Finally, through a large number of experiments, it is proved that the NFT artwork transmission programming algorithm based on artificial intelligence deep learning proposed in this paper can control the overall style of image generation according to the needs of the transmission, and the generated image features have good details and high visual quality. Although the method in this paper has certain effects, the complex network structure and the huge amount of computation and parameters are still the key problems that plague us. In future work, we should pay more attention to efficient lightweight network structure design, which can improve the training speed and inference speed of the network model, and at the same time greatly reduce the computational load.

Data Availability

The experimental data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declared that they have no conflicts of interest regarding this work.

Acknowledgments

This work was financially supported by the special funding of Guiyang Science and Technology Bureau and Guiyang University (GYU-KYZ(2019)MS-03).