Abstract

Hand drawing is an indispensable professional skill in the fields of environmental design, industrial design, architectural engineering, civil engineering, and other engineering design education. Students usually imitate masterpieces to practice basic skills, which is an important link for a beginner. A system for digital management requires a function for an automatic recommendation task of different brushwork skill expressions. Thus, the classification method for brushwork is to combine hand-crafted features generated by DCNN and then use the final features for input to a tree structure classification scheme. The method improvement of the other deep learning models has effectiveness in distinguishing art ontology attributes.

1. Introduction

Hand painting, the purpose of which is to continue the originality of the engineering design, is the chief step of design process. Sketching or design drawing skills and techniques are essential for successfully developing the next generation of environmental design (ED), industrial design (ID), architecture engineering (AE), and civil engineering (CE) [1]. The technical topics of sketching exercise include line weights, shading, and how to use pen and ink, colored pencils, and felt-tip markers to create architectural drawings with significant impact and aesthetic appeal. Students usually imitate masterpieces to practice basic skills, which is an important link for a beginner. And there is an emerging need to improve existing educational ways to enhance the learning experiences of students [2]. Design drawing skills in the domain of paintings have been used for painting analysis to support applications such as brush-stroke detection [3], image recommendation, and annotation and retrieval [47]. These efforts include the use of handmade features (artificially designed feature extraction algorithms, which mainly extract color features, texture features, and geometric features) in the early stage to perform classification method and then eventually apply deep convolutional neural network (DCNN) classification model method.

However, these two methods have not been previously combined. This research is aimed at designing a function for distinguishing brushwork of sketching skills in design and creating a recommendation result for the automatically recommended sketch exercises to students. This recommendation system is a reliable self-teaching tool that offers a fast way to find the kind of skill that students want to learn for design drawings and leads students through digital techniques to enhance presentation drawings quickly and easily, as shown in Figure 1.

This issue requires human artist experts to solve the problem of understanding a painting ontology. Meanwhile, DCNN has shown superiority in extracting such useful image features automatically. Therefore, the way is combining human understanding and deep learning methods by using two feature extraction methods in the feature extraction process (Figure 2): feature engineering that extracts certain suitable handcrafted features and feature learning that generates features using DCNN model of maturity. The method used the machine expert system which is a multilevel approach decision to classify skill styles. The results evaluated each layer performance of classification and compared this method with the DCNN classification model. Experimental results demonstrated that the proposed method is capable of achieving state-of-the-art performance in all public benchmarks.

1.1. Related Work

Early painting digital management systems are only fit for the digital transformation of artworks and exhibitions. With the acceleration of digital conversion of paintings and designs, artists demanded something that distinguishes artworks into classes through the use of digital artwork management tools. Annotation and classification based on brushwork technologies are applied to artwork collections and retrievals. Studies on style image classification have been conducted due to artistic cognition of strong subjectivity.

Several groups have investigated about distinguishing painting styles. However, most of these efforts focused on comparing the painting styles of artists and the classifications of paintings in the genre of fine art. Hatano [8], Sablatnig et al. [9], Keren [10], and Li and Wang [11] focused on comparing the painting styles of artists. These presented methods attempted to determine the painter in question. Johnson et al. [12], Culjak et al. [13], and Arora and Elgammal [14] studied the classification task between fine art genres: Baroque, Impressionism, Cubism, Abstract, Expressionism, Realism, and Fauvism. Only few researchers focused on drawing techniques. Meanwhile, the accuracy rate of attribute-based image classification is not as high as that of content-based image classification. Jialie [15] identified the artist by comparing the painting styles of artists using a statistical model, and the identification accuracy only reached 69.7%. Hughes et al. [16] aimed to recognize visual image styles such as the moods (serene, melancholy), genres (vintage, romantic, and horror), and types of scenes (hazy, sunny). With such an approach, artistic images can be searched and ranked by style. These styles are not mutually exclusive, and per class accuracies range from 72% to 94%, representing the different attributes of style and the inaccuracy due to the classification process, which disregards the self-description of the artwork content and research on the properties of art style. The art of painting should be classified with regard to the literature on classification of art image because only art ontology is usually distinguished rather than the contents of the image.

Distinguishing and indicating the style techniques of artistic images from the image database also caught the attention of certain research teams. Yelizaveta et al. [17] were the first to become involved in style-based annotation for artistic brushwork concepts in the painting domain, and they presented a framework for the annotation of paintings with brushwork classes based on domain-specific ontologies such as brushwork technique. In addition, the authors employed low level feature analysis and serial multiexpert framework with semisupervised clustering methods to perform the annotation of brushwork patterns. Jiang et al. [18] discussed the problem of Gongbi and Xieyi expression techniques regarding traditional Chinese paintings (TCP). Feature engineering contained color histogram, color coherence vectors (CCV), and edge-size histogram. The database included 1799 Gongbi paintings, 1889 Xieyi paintings, and 5827 non-TCP images. They reported an accuracy of 91.01% based on CCV and 90.3% based on color histogram. Kammerer et al. [19] focused on stroke analysis which is the determination of the drawing technique used to draft the painting, in which the features of the paintings are extracted using the texture and contour. Lu et al. [20] proposed the characteristics of a fundamental TCP style expression technique. The training set was constructed by collecting 148 TCP images from four art expression techniques, including Xieyi, Gongbi, Goule, and Shese, and 134 non-TCP images. The authors claimed to distinguish TCP from non-TCP and further classify TCP style based on expression technique with an accuracy greater than 85%.

These methods are based on handcrafted feature extraction for painting technique style. Along with the development of deep learning study, the researchers began to expand the deep learning method to classify artistic images. Gando et al. [21] used AlexNet [22] with Batch Normalization (BN) [23] to distinguish one kind of art style and identified illustration style expression technique from photographs and 3D graphics. The accuracy of their classifier achieved 96.8%. However, distinguishing more species makes it increasingly difficult with this method. Sheng and Li [24] propose a convolutional neural network- (CNN-) based feature description, feature-weighted, and feature-prioritized algorithm to classify the artist of TCP; the accuracies range from 81% to 96%. Thus, the classification method for style used deeply learned feature to achieve overwhelmingly better classification performances.

1.2. Brushwork Factor Analysis

Based on assessing paintings of expert artists, this study identifies color and texture spatial dependence features that could be useful for classification. Table 1 shows the example images and the factor of brushwork.

1.3. Color Features
1.3.1. Statistics of Major Colors

Color is an important visual attribute for both human perception and computer vision and one of the most widely used visual features in image classification and retrieval. When light strikes the surface of an opaque medium, except for a small portion of light that is reflected on the surface of the medium, most of it enters the interior of the medium and is absorbed and scattered to produce different colors. Through feature analysis, it is concluded that sketch contains far fewer main colors than the other colored drawing techniques. Therefore, with major color features applied, it can better distinguish sketch from the other drawing techniques. The theory verifies the relationship between the spectral reflectance of an object and the pigment concentration under certain conditions. That is, the colors of the object are determined by the component of reflected light after the object selectively absorbs the incident light. If the spectral reflectance of the surfaces of the two objects is the same, then colors are essentially the same. Therefore, the reflectance in the visible range is used to represent the colors. The corresponding chromaticity information is predicted by obtaining the spectral reflectance of each point of the painting, and then, the number of colors of the entire image is obtained by statistics.

A multispectral imaging system is used to acquire the multichannel information on the image surface. The multispectral imaging system is assembled by connecting standard illumination source, M optical filters, and 3-color CCD (Charge-Coupled Device) digital camera with the computer. Assuming that the photoelectric conversion function of the multispectral acquisition system is linear, the digital response output of the channel can be expressed by where is the spectral sensitivity function of the CCD band, which is the relative power distribution of the light source; is the spectral transmittance of the filter, is the spectral reflectance of the object, and is the camera noise. In the calculation, is generally evenly divided into wavelength intervals, and each wavelength center interval is represented by a subscript (). If noise is ignored, formula (1) can be expressed as

For a pixel at a certain point on the image, the spectral reflectance of the point is reconstructed by its multichannel digital response output, which determines the exact color information of the point. The spectrum in the visible range of 380~780 nm is taken, and the spectral reflectance is sampled at an interval of 5 nm so that the spectral reflectance of the surface of the object consists of dimensional vectors. Where represents the digital response output of the channel, represents the spectral reflectance of the object, and the transformation matrix is calculated from , , and .

When a certain spectral reflectance frequency is greater than the threshold , this color is considered to be the primary color of the painting. It is called a major color and the number is calculated to meet the conditions of the major colors. Major color count feature is extracted as shown below:

1.3.2. Pixel Differences

There are two features with pixel difference. This study used a special color difference histogram method to extract a color feature. The color histogram describes the global distribution of pixels in an image. The main advantage of a color histogram is its sensitivity to variations in scale, rotation, and translation of an image. Since there are obvious differences between the marker technique and the colored pencil technique, the RGB difference histogram can better distinguish the two types of techniques. Extraction first solves the difference between each pair channel of RGB (Red, Green, Blue) and then builds the difference histogram . Each channel is divided into 16 color ranges, , , and are the number of pixels of RGB channel in each color range, and is the bin of histogram.

The method defines the second feature as pixel saturation ratio. To meet people’s perception of color, the color of the image space transforms from RGB to HSV (Hue, Saturation, Value) space, and each color component is uniformly quantized. After completing the quantization of the HSV space, each image is divided into 256 color ranges.

The pixel saturation ratio between the number of highly saturated and unsaturated pixels in the image was introduced by Cutzu et al. [25], wherein they only used one ratio between the count in the highest bin and the lowest saturation histogram to distinguish between photographs and paintings. The adaptability and anti-interference ability of one ratio is poor in the experiment. Hence, the method selects the combination of ratio as a feature for style image saturation feature extraction. The input images are transformed to HSV color space, and then, the ratio value of intervals in each bin saturation value and minimum saturation is calculated:

is the saturation level in the range , is the quantified saturation level, and denotes an estimate of the probability of occurrence of the saturation level. is the ratio between , , levels of saturated and highly unsaturated pixels in one image.

1.4. Features Based on Texture

Brushwork can be captured by texture features. Gray level cooccurrence matrix (GLCM) [26] is an important feature to distinguish painting styles. GLCM which is based on repetitions in tone settings and represents grayscale transitions in images describes the texture spatial dependence. GLCM also makes full use of the gray level distribution properties in texture, which can produce second-order statistical characteristic and describe a certain amount of texture features based on statistical methods. Lu et al. [20] have successfully used GLCM information for TCP classification. Inspired by the above work, the method uses GLCM to extract the texture of different sketching skills with design. The grayscale range of the gray level image is and the image size . The GLCM is a square matrix considering

means the frequency value of grayscale tones and , the matrix size of is . Obviously, the distance between and in the matrix and the direction can be horizontal , vertical , primary diagonal , or secondary diagonal .

2. Method

The method architecture of the proposed part-stacked feature box based on brushwork factor analysis is discussed in this section. Figure 2 illustrates that the proposed feature box architecture is decomposed into the feature engineering and feature learning processes.

The study combined handcrafted features with DCNN-generated features and then applied the integrated features. The study adopted CaffeNet [27], a slightly modified version of the standard seven-layer AlexNet architecture (Figure 3), as the feature learning structure. A unique design in the architecture is the combined handcrafted feature extraction engineering with deep learning feature extraction method.

2.1. Deeply Learned Feature

DCNN consists of layers of small computational units that process visual information hierarchically. Each layer of units is a collection of image filters and extracts a certain feature from the input image. Each layer consists of the correlations between the different filter responses over the spatial extent of the feature maps.

Upon using the existing network model for training, validating, and testing the dataset, the study found that the projections from each layer show the hierarchical nature of the features in the network, the underlying hierarchical characteristic response in the image details of edge or texture, and the high hierarchical response characteristics show entire objects with a significant content. Deep networks could potentially lead to better recognition accuracy, but may also result in low efficiency. Therefore, the study chose CaffeNet framework [27] to ensure effective features for classification results and save time.

Different features are extracted from varying layers of DCNN. A given input image is represented as a set of filtered images at each processing stage in the DCNN, and the feature map of the filtered images shows the hierarchical characteristics of units per layer of the network. The study used DeconvNet to visualize the feature [28]. In ConvNet, layer 2 responds to corners and edge conjunctions. Layer 3 has more complex invariances, capturing similar textures, and layer 4 shows a significant variation.

2.2. Multiple Decision Hierarchy for Classification

Most of the early studies utilized a single classifier approach to assign labels in style image classification task. This approach is shown to be fruitful in many applications [29]. Although in-depth learning has witnessed rapid development in terms of image recognition in the past two years, relevant studies mainly focused on recognition of image content rather than painting techniques. Combining the advantages of in-depth learning in extracting image characteristics and those of manual artistic design extraction, the current study decided to construct a separator with a decision-making tree and SVM method and highlight the objective of recognizing the characteristics of painting techniques. SVM is easy to train and has better generalization ability. The principle is to automatically find the support vector with sound classification capability through machine learning and allow the constructed classifier to maximize the interval between different classes and gain an advantage in solving nonlinear and high-dimensional classification problems. Several studies have shown that the use of multihierarchy approaches could lead to higher accuracy as compared to single classifier approach [30, 31].

The multilevel analytics framework (MLAF) method is used to achieve multiclass classification by combining several Support Vector Machine (SVM) [32] subclassifiers into a binary tree structure. The method assigns training samples of the pen drawing class (the most recognizable) to the positive category and training samples of the rest of the classes to the negative category and then trains the first subclassifier SVM. Similarly, the multihierarchy method assigns training samples of watercolor class to the positive category and the rest of the training samples to the negative category and then trains the second subclassifier SVM. Each subclass is divided into two. The decision process consecutively traverses the tree in top-bottom sequential fashion. By analogy, four categories can produce three SVM subclassifiers. Each subclassifier is a binary classification problem. At the classification stage, the unknown samples are loaded through the first subclassifier SVM, and then, classification is performed until the judge value of a subclassifier is positive.

With the multilevel approach, the study progressively reduces the subset of classes to which a pattern might belong at each level of the decision hierarchy. The classification process of 4 classes is illustrated in Figure 4.

2.3. Experiments
2.3.1. Datasets

The image dataset contains four categories of drawing technique styles, and the project collected many teaching cases from universities such as architecture and interior design of drawing. In addition, for more effective results, experiments obtained data from online communities. Finally, datasets collected 1000 original design sketches. After cropping, the sketches formed a total of 4000 sketches in the drawing image dataset. The size of each image item in the dataset is . Experiments used 60% of data items in the dataset for training and the remaining for evaluating performance.

2.3.2. Feature Extracted

The study evaluated the method in the dataset and built the structure with two streams (Figure 2). The study concatenated the handcrafted features extracted from the feature engineering process with the features learned by using the CaffeNet model to generate the features. Deac et al. [33] concluded that the texture information related to brush strokes is the most discriminating feature. According to the characteristics of the brush strokes, the study chose the 3rd ConvNet (conv3) and 4th ConvNet (conv4) layers as the feature source of extracted hierarchical. A total of 384 different units are convolved in the conv3 layer and the conv4 layer. To reduce feature amount of calculation, the method was initialized through principal component analysis. Each image projected the 384-dimensional ConvNet layer output to a one-dimensional feature map and used sequential forward selection (SFS) [34] to choose the feature subset. conv3_1 and conv4_1 were identical to a low-rank projection of the model output.

2.3.3. Normalization of Sample Characteristics

Data normalization [35] is important in traditional mode classification and can even influence the entire system. The study used probability distribution normalization method to normalize the probability distribution of characteristic data within . The normalization formula is shown in where . The result is the original data within [0, 1] and then applied to the BT-SVM hierarchical classifier to train and evaluate.

3. Results and Discussions

The experimental measurements evaluated the classification performance of the method. The results were measured against precision, recall, and accuracy according to Equations (9) and (10). Each of the results is presented below.

Table 2 shows the classification accuracy in different decision hierarchies. As a comparison, results also tested the performance by using single handcrafted features. In the decision hierarchy, SVM1 and GLCM performed worse than other methods, indicating that traditional texture feature is poorer than color feature in distinguishing pen and ink brushwork. Using the main color feature can produce slightly better results. The study also used a special color difference histogram method to extract color feature. Unlike other categories, the color difference histogram values of pen brushwork tend to be zero. In the feature learning section, the study used the conv3_1 feature from the CaffeNet DCNN model and the ConvNet layer 3 feature map. Although the color feature produced obvious results in the first decision hierarchy, its classification accuracy is still lower than this method.

Figure 5 shows that the proposed method combining ConvNet feature map with feature engineering produces an ideal result. SVM1 has the highest response values, in line with the people’s cognition that pen-and-ink brushwork is easy to identify. In the decision hierarchy SVM2, the experiments evaluated the effectiveness of the texture feature and color feature for distinguishing watercolor brushwork image from other types. In this hierarchy, the results showed that the accuracy of texture feature is better than that of color feature. In this hierarchy feature learning section, the study chose two ConvNet feature maps: ConvNet layer 3 and layer 4. After the dimension reduced processing, the method obtained feature vectors, conv3_1 and conv4_1. As shown in the decision hierarchy SVM3, the saturation ratio feature was deemed useful for classification between the pencil skill and felt-tip markers which achieved an accuracy of approximately 72.7%, and GLCM produced better results than the RGB histogram. In this hierarchy feature learning section, the method chose feature vector extraction from ConvNet layer 4 feature map. Table 2 illustrates that the method achieves better classification accuracy in each decision hierarchy.

To ensure the consistency of correlation data, the data sample was not used to test the experiment method of different painting technique classifications (i.e., recognizing the author of the painting and distinguishing illustrations and photos or different types of TCP), but selected several classic in-depth learning methods such as AlexNet, AlexNet-OWT [36], AlexNet-OWT-BN, and VGG16. Experiments applied the same data sample to conduct comparative experiment and used stochastic gradient descent (SGD) with a batch size of 64 examples, momentum of 0.9, and weight decay of 0.0005.

Table 3 summarizes the validation time and identification accuracy of the DCNN model and the method with the dataset. Notably, the effectiveness of this method is superior over other DCNN model frameworks. Apparently, the DCNN model has the advantage on image content analysis, but brushwork technique classification task disregards the content and attaches importance to ontology. Applying the feature map of deep learning to the traditional classifier can undeniably improve the classification effect. This method has its novelty and effectiveness in distinguishing art ontology attributes.

4. Conclusion

The studies reported a classification framework for brushwork classification which is useful in digital management system of engineering design education. This function classifies the ontological attribute of art according to the hand drawing techniques of engineering design image, abandons the interference of the image content to the classification, and effectively extends the style-based image retrieval function of the management system. To perform categorization, the study combined feature engineering methods with feature learning methods to extract features and used the BT-SVM classifier. This framework is the most effective way of constructing dataset as it has a high accuracy and reasonable timing for implementing the brushwork recommendation in ED, AE, and CE. The proposed method is applicable to the popular drawing techniques in modern design art. To verify the effective expansibility of this method, the study tested the images of several main design categories, e.g., painting, clothing design, and stage design works; the results of which are consistent with the performance of environmental works.

The framework also has benefits for learning the characteristics of brushwork ontology. It enables students to quickly obtain lots of the same style when they independently copy and practice the expression of brushwork and improves the accuracy of students’ works of the expression. At the same time, it is more convenient and fast for teachers to manage students’ works and centralizes the hand-painted works of the same expression type.

Data Availability

The data used to support the findings of this study are included in the article.

Conflicts of Interest

The author declares no conflicts of interest.