Abstract

Up to now, the development path of Chinese women's liberation movement and modern Chinese women's literature is fundamentally different from that of the West. Therefore, the study of Chinese women's literature can not only rely on the essentialism of western feminist literary theory but also must return to the social reality and cultural reality of China. Based on social media data mining, this study uses the Word2vec model to map the text content to a more abstract word vector space, improves the original Text Rank algorithm from three aspects, semantic association between words, word frequency, and word directionality, then carries out feature extraction, and applies this algorithm to the generation of user tags. The feasibility and superiority of the model are verified by comparative experiments on LFR benchmark network. The research in this study provides a reference for the analysis of users' interests and behaviors and has certain theoretical significance and application value.

1. Introduction

Due to the vigorous development of western feminist movement, feminism first started from the political and economic fields and gradually turned to the cultural and ideological fields. In the meantime, the understanding of feminism has gradually gone deep into the ideological and cultural field from the call for equality in political and economic status, from the demand for equality to the emphasis on women's differences and the development to gender awareness and diversified feminism, which shows the pluralistic, open, and changing theoretical form of feminism [1]. In the field of women's literature research, some important theoretical concepts, including women's literature, have always been the focus of academic discussion. After a long discussion, these concepts have gradually become the theoretical terms of localized female literary criticism [2]. Sexual literary criticism has become an interdisciplinary theory, combining philosophy, economics, sociology, psychology, etc., on the basis of nativity and turning the vision to all fields of people's lives.

Another difference of feminist literary theory lies in its gender color, which is reflected not only in its naming but also in the distinct gender gap and the resulting opposition in the process of its dissemination and acceptance [3, 4]. Ordinola et al. believe that literary works are the combination of its literary characteristics, imagination, and language, and its significance not only relates to its grammatical structure and logical structure but also relates to its association [5]. However, the meanings evoked by words and sentences in literary works are relatively stable because grammatical rules and logic are relatively stable. Huber et al. made a careful distinction between feminism and feminism: “the two common words in China are feminism and feminism” [6]. Obviously, the difference between the two statements lies in “power” and “gender.” Nedelea has repeatedly stated: I take a feminist position first of all from my own female life experience, as well as the skin-piercing pain that some women can experience [7]. Sari et al. expressed it as “starting from a strong life experience” [8]. Yuan thinks that feminist thought exaggerates the distance between the sexes unilaterally, but does not think that because there is a huge distance between the sexes, it was just covered up before, and now, it is revealed by feminism [9]. The resulting view may not really be close to feminism, but what is important is that it represents a will, that is, the will of men and women to get to know each other. I believe that, in the end, the health prospect of human beings depends neither on male hegemony nor feminism, but only on their efforts to get to know each other.

On social media, simple keyword extraction can no longer meet the needs of users. Because of the inherent characteristics of social media data, many meaningless words often appear in the results of feature word extraction, such as some junk information or information inconsistent with the theme. Think of a traditional technique to solve this problem: text classification technology. Text classification technology can effectively distinguish spam and information according to different topics. While solving these problems, the classification of short texts on social media also challenges the traditional text classification technology. In this study, scraping comment data from social media, applying text mining techniques to two new tasks in Weibo, and aiming at the problems of high noise and sparsity of Weibo data, the traditional text mining technology is implemented on Weibo data, and a new solution is proposed.

The innovation of this study is as follows:(1)An improved method based on Word2vec and Text Rank is proposed to extract features of short social media texts. Word2vec is used to map the text content to a more abstract word vector space, and then, the Text Rank algorithm is improved from three aspects: semantic association between words, word frequency, and directionality between words, and the algorithm is applied to user portrait analysis.(2)An improved QLFM_LP overlapping community discovery algorithm is proposed, which can partition quickly, and is suitable for link prediction technology. It can effectively reduce the number of negative samples in the training set and improve the training speed of the model while ensuring good prediction effect.

The content of this study is divided into the following five parts, and its organizational structure is as follows.

Section 1 introduces the research background and significance and then introduces the main work of this study. Section 2 mainly introduces the related technologies of social media data mining. Section 3 puts forward the specific methods and implementation of this research. Section 4 verifies the superiority and feasibility of this research model. Section 5 is the summary and prospect of the full text.

2.1. Research Status of Feminist Literary Theory

Zhang pointed out that being a slave on the production assembly line is not the liberation of slaves by the kitchen pool [10]. Housework creates the use value and its important function is to create surplus value. Family is the pillar of capitalist labor organization. Han Yu combines Marxist theoretical interest in literary creation and ideology with feminist thinking on women's works [11]. People oppose discrimination in the private sphere and put forward that housework should be regarded as a place to sustain women's culture. Zhang believes that the reference for defining and distinguishing women is men, and men fix women (others) as evil incarnates by establishing the theory of opposition between good and evil. Only when women choose to live like themselves and build their own future with creative design such as transcendental subjects can they be liberated or perfected [12]. Minnullin advocates that idol destruction must start from inside, and women must reject and drive away any internalized otherness consciousness and refuse objectification [13]. Lawtoo puts forward how the gender is replaced by a social contract with a difference closely related to power, language, and meaning [14]. Although the intermediary strategy of deconstructionism will not abolish the hierarchical thinking of binary opposition and the power that ordinary people think, it can prompt us to rethink the power and its characteristics and recognize the nature of power's unfitness and discontinuity.

2.2. Overview of Text Data Mining

With the advent of the era of big data, text mining has become a new research field. It is mainly a process of discovering potential and possible data patterns, internal relations, laws, development trends, etc., from a large number of unstructured text information, extracting valuable knowledge that is effective, novel, useful, and understandable and scattered in text files and using this knowledge to better organize information. Stopar et al. believe that text mining refers to the process of extracting interest information and nonretrieval information or knowledge from unstructured text collections [15]. In the general process of text mining proposed by Zhao et al., text data mining can be divided into the following stages: text data preprocessing, feature extraction, structure analysis, text summarization, text classification, text clustering, and association analysis [16]. Lee et al. explored the changes of learners' learning effects in different stages by collecting and analyzing the relevant text data of learners' learning experiences and found the correlation between learners' learning behaviors and learning effects [17]. Yang et al. use the interactive graphics of text data to evaluate the multidimensional state characteristics of learners' knowledge structure, cognitive ability, emotional attitude, etc., which is helpful for teachers' teaching decisions and learners' self-learning monitoring [18]. Bo et al. used the method of similarity to automatically extract population data (age, status, and social category) from the information description of Twitter users [19]. Confirmation bias and social differentiation proposed by Iqbal et al. make it possible to detect those topics that are used for misleading purposes at an early stage [20]. Mosa proposed a model for content detection of fraudulent clicks. Based on this model, they provided relevant analysis on the topic, title, and influence of fake news [21].

2.3. Review of Research

However, the above methods, the traditional text feature representation methods, based on the vector space model, face the short text of social media because its content is weak in purpose, the text is difficult to provide effective semantic features, and the text vector is sparse. Word vector representation technology based on deep learning can well reflect the grammatical and semantic relationship of words, but in the process of training, the word order information is lost, and the influence of word order on text meaning is not considered. The main purpose of this study is the automatic generation of social media user tags, which is similar to the task of feature word extraction. As a classic task in natural language processing, feature word extraction technology has been studied by many scholars, and the extracted word “special” can also well reflect users' interests and behavioral characteristics.

3. Methodology

3.1. Deconstruction and Feminist Literature

There are many similarities between deconstruction and feminism. Its deconstructionism holds that the presence of “special,” “truth,” and “existence” depends on another fixed technical absence, and only in the relative relationship and the signifier chain to which they belong can we get our own definition so that the original meaning of “existence” will be replaced by endless delays or games in absence. Then, one cannot help but ask what else can we talk about feminist theory and practice without a unified subject with a solid sense of a priori history and gender? Many Americans believe that French feminism emphasis on “natural language” and cannot weaken the essentialism of metaphysical strategy; on the contrary, it will fall back into the mysterious features and binary opposition that deconstructionism attempts to crush or deconstruct.

Deconstruction feminism has perfected its own feminist theory. Absorbing the fluidity and mystique of “subject presence,” the word “female” is released into the future with multiple meanings, and it is liberated from the restricted power field, making it a place that can carry unexpected meanings. Deconstruction feminism insists that the criticism of the subject is not a denial or abandonment, but a way to question its composition as a predetermined or fundamentalist premise.

For example, “body language,” body and subject, can replace each other and I am my body. “Body” is not just an object of external cognition, but the ability to experience perceptual function, which can make everything in the world better reveal its hidden mystery. Therefore, feminist criticism takes it as a theoretical resource to produce explanatory works and the construction of female subjects. It is different from the pure material body described by ordinary men.

The reason why there is feminism is that women are dissatisfied with their historical existence and real existence. So what do women want after criticizing the patriarchal society? This is a question of feminism by many people who disapprove of feminism and also a question of feminists themselves. Where do I come from and where do I want to go? This is an ultimate proposition of mankind, and women cannot be excluded from this ultimate inquiry.

3.2. Feature Extraction of Short Text for Social Media

In the field of information extraction, the traditional task of event extraction is aimed at news text. However, news text and Weibo text are quite different in text form and content, and some features of Weibo text bring more challenges to the task of event extraction. The most important event elements in an event are the theme elements and timing elements of the event.

In this study, a multiview clustering algorithm is proposed, which can simultaneously consider the subject information of the text and the time-series information of the Weibo. Because the most important thing of an event is its semantic features and temporal features, the combination of the two types of information is the core of this method.

A multiview word clustering model is proposed, which uses both topic and time-series information to complete event extraction. As shown in Figure 1, our proposed method consists of three parts.

First of all, preprocess the Weibo text, including part-of-speech tagging, stem segmentation, and low-frequency word filtering. After that, several topic keywords are extracted as candidate event feature words through the topic model. Finally, two similarity matrices are used as the input of multiview general clustering algorithm, and the keyword clustering result is the output event.

From the topic model, having obtained the distribution matrix of words about the topic, namely, the parameter , keyword can be expressed as topic vector . Cosine similarity is used to calculate the similarity between keywords:

Let us assume that our Weibo data covers days and take each day as a time node. Therefore, the signal for a word can be listed as a sequence of length as follows:where represents the frequency of occurrence of keywords in time period .

For convenience, simplify the distance formula of the samples to the similarity formula. Therefore, the time-series similarity of keyword is shown as

Similar to the construction method of semantic similarity matrix, we can also construct the temporal similarity matrix of keywords. Similarly, the element in the matrix represents the temporal similarity of the keyword .

In this study, based on the characteristics of short texts of social media, such as small length, loud noise, poor standardization, and serious sparsity, the training texts are preprocessed at first. Then, the preprocessed dataset is processed in two parts, respectively. On this basis, according to the semantic relation between words and the frequency of words appearing and the directionality between words, the Text Rank algorithm is improved, and I_Text Rank feature extraction algorithm is proposed. The overall algorithm flow is shown in Figure 2:

The Word2vec model can learn high-quality and multiangle word vectors from large-scale corpus in a short time so that the semantic similarity between words can be conveniently calculated. In addition, the word vector trained by Word2vec is calculated according to the context in which the words are located, which fully captures the semantic information of the context, and it is easy to calculate the similarity between two words through it, which reflects their semantic relationship in the text and enriches their semantic information.

The process is divided into three steps:(1)Preprocessing: first, preprocess the extracted short text data of social media, including text segmentation and stop words’ removal (including punctuation, numbers, single words, and other meaningless words).(2)Word2vec training: train the preprocessed data and transform the word vectors, and set the frequency of words to be greater than or equal to 5 so that words with too little frequency can be excluded and the word vectors of other words can be output.(3)Get candidate feature words and their word vectors: duplicate the dataset preprocessed in the first step and limit its part of speech, leaving only the part of speech suitable for feature words.

The more times a word appears, the more important it is. We obtainwhere represents the frequency of occurrence of the word , represents the frequency of occurrence of each word in the degree of , and represents the ratio of the frequency of occurrence of to the sum of the frequencies that transmits to each word, thus indicating the influence of the frequency of occurrence of words.

The word points to another word , so the importance of the word will be transferred to the word . In addition, if there are more words pointing to word B, that is, the more words associated with this word, the more important this word is.

In this regard, we keep Page Rank's treatment of this problem and give the following formula:where indicates the influence of to , indicates the number of other words that points to, and the influence of will be evenly distributed to other words.

Substitute it into Text Rank formula, call it I_Text Rank algorithm, and update the formula as follows:where is a vector with all components of 1 and dimension of . When the difference between the calculation results of two adjacent iterations is small, stop the iterative calculation, that is, the iterative calculation results have converged.

After convergence, the current weights of all vocabulary nodes are arranged in descending order, and the first words are selected as the features of the document for output.

3.3. Link Prediction in Social Networks

Social network has become the main platform for users to share information such as short messages, news, games, and applications. Accordingly, these abundant data sources gradually form an ecosystem of user-centered services and tools. Usually, the data that link prediction relies on is only the existing links observed in the system, and the hidden variable model is needed to explain the basic structural features and summaries of network data. In fact, there are no edges between many two nodes only because of the lack of collected data or the potential relationship between them, and edges will be established in the next short time.

Overlapping community discovery algorithms generally have high time complexity. This section will focus on analyzing the implementation process of LFM overlapping community discovery algorithm, make three improvements to it, and propose an improved QLFM _ LP (Quick LFM for Link Prediction) overlapping community discovery algorithm which can quickly partition and is suitable for link prediction technology.

Applying the exchangeable array theory to social networks, we can reasonably use the auxiliary variable to model the hidden variable . The empirical approach within the framework of Gaussian process regression is to assign a Gaussian process prior to :where is the covariance matrix defined above . That is, a Gaussian process with a mean value of 0 and a covariance matrix of , in which the th element of is calculated by Gaussian function above :

Take a graph sequence from the social network graph by time step. In each snapshot graph, we calculate separately and then use the idea of moving average line of nodes to calculate the average degree of nodes in the whole network and extract the corresponding subnet. The average degree of nodes in the extracted subnet is

Finally, the calculated is substituted into the definition formula of the node guiding force, and a new formula of the node guiding force combined with the node time attribute is obtained:

In the process of community expansion, a “visited” tag is added to each node that has joined the community to solve the problem that LFM may repeatedly join and eliminate the community of the same node in the process of community expansion, which may cause the algorithm to fall into an endless loop.

This study introduces a second-level coding, which not only has a unique coding for nodes but also codes the generated communities when each community is generated and adds the community number attribute to the nodes in the communities so that the information of the communities to which the nodes belong can be obtained while visiting the nodes.

By introducing the “central node,” the problem of high time complexity caused by recalculating the fitness of all nodes in the community every time a new node is added to LFM algorithm in the process of community expansion is solved.

The so-called central node refers to the node only connected with the current community . Then, the fitness function of the central node to the community is

It can be seen from the above formula that the fitness value of the central node to the current community must be positive, so after the central node is directly added to the community, it is not necessary to calculate the fitness of the nodes in the community after joining the central node.

The flow of the improved QLFM_LP overlapping community discovery algorithm is shown in Figure 3.

4. Results’ Analysis and Discussion

The dataset in this study is obtained from Wikipedia and Sina Weibo published by users. First, crawl more than 200,000 pieces of data from Sina Weibo, and finally, download 20 long documents from Wikipedia. In this study, we selected 20,000 pieces of data for the experiment. Table 1 is a descriptive comparison of different datasets.

In this study, Jieba word segmentation tool in python is used for preprocessing operations such as word segmentation and noise removal. The dataset obtained by pretreatment is divided into two steps for processing:(1)The dataset is trained by the Skip-gram model in Word2vec, and its word vector is obtained(2)Duplicate datasets, limit their part of speech, and remove unsuitable words

Different numbers of feature words are extracted based on different datasets and different algorithms, and experimental evaluations and discussions are carried out. Figure 4 is a comparison of Wikipedia's long texts. Figure 5 shows the experimental results of Sina Weibo's data under different extraction quantities of feature words.

It can be seen from the data that the experimental effects of the three algorithms on long texts are basically the same. In five long texts, the I_Text Rank algorithm proposed in this section shows the best results in two of them. Generally speaking, in the experiment of long text, the algorithm in this section is not much different from Text Rank algorithm.

It can be seen from the figure that no matter how many feature words are extracted, the I_Text Rank algorithm proposed in this section is always higher than the other two feature word extraction algorithms. In addition, the accuracy of I_Text Rank algorithm when the number of extracted words is 5 is the best among different feature words.

The performance of the improved QLFM_LP algorithm in this section is evaluated by comparing the QLFM_LP algorithm with LFM algorithm in execution rate and NMI index on different LFR benchmark networks generated by the parameters provided in Table 2.

The execution time of QLFM_LP algorithm and LFM algorithm for community discovery under the same dataset is shown in Figure 6.

The execution speed of the improved QLFM_LP overlapping community discovery algorithm in this section is much better than that of LFM algorithm in the network scale of 2500 nodes because the central node introduced in this study and the added “access tag” greatly reduces the complexity of QLFM_LP algorithm and improves the execution speed of the algorithm. On the whole, QLFM_LP overlapping community discovery algorithm has great advantages over the original LFM community discovery algorithm in execution speed.

In this experiment, the NMI index values of the two algorithms with “one” of 5, 10, 15, 20, 25, and 30 are compared, respectively, and the results are shown in Figure 7.

It can be seen that, with the increase of the number of overlapping nodes, the NMI index of both algorithms decreases. On the whole, however, the QLFM_LP algorithm is superior to the original LFM algorithm in NMI index values of overlapping community division results. This is because with the increase of the number of overlapping nodes, the structure of overlapping communities becomes more blurred, which brings more difficulties to community discovery, and the corresponding NMI indicators will decrease. It can also ensure that the discovered community structure will not change to a certain extent, reducing the possibility of repeated community reconstruction.

Compare the user's characteristic word with the downloaded tag word bank, and if the word is included in the tag word bank, classify the word into its category. Finally, the more feature words in the category are, the more it can reflect users' interests. The text will use the previously extracted user feature words to analyze the user's hobbies, as shown in Figure 8.

It can be seen “daily” accounts for 39% of the user tags, accounting for the majority. That is to say, Sina Weibo users usually prefer to send something in their mood. Users pay more attention to “photography,” “mobile games,” and “opera.” This reflects the interests and concerns of users in Weibo and can be used as a basis for recommendation and other work.

The commonness of contemporary female writers is that they are good at telling stories, and most of them have good stories, and most of them have the ability to tell stories delicately. The novel breaks the traditional novel narrative and endows the novel with modern techniques. Different from postmodernism abandoning the position of subject, feminist theorists choose and rebuild their theoretical subject through a series of exclusions, and the establishment of subjectivity is the main starting point of women's political movement and feminism. Because even after the gender oppression is eliminated, literature is only examined from the gender perspective, and there are still differences in content and expression between men and women in literary expression, so gender literature with its own characteristics will always accompany mankind.

The theoretical network of gender poetics is constructed from the changes of environment, the abundance of material, the operation of capital rules, the duplication and variation of art, the limitations of system, the existing state of men and women, the division of body and soul, and the ultimate existence of freedom. At this time, only then can we see that the study of Chinese women's literature can not only rely on the essentialism of western feminist literary theory but also must return to China's social reality and cultural reality, feel and understand the liberation road of Chinese women on the basis of Chinese social reality and cultural reality, and feel, understand, and explain Chinese women's literary works on the basis of Chinese social reality and cultural reality. On this basis, the love relationship is the love relationship between two independent individuals, and the resulting marriage is equal at least at the initial stage, rather than the male's possession and oppression of women.

Under this background, what women's literature and its criticism can do is only to remind women that, in the still strong and deep-rooted patriarchal culture, women should keep a clear consciousness, so as to correct the deviation of gender culture in real culture with a more active and healthy attitude and realize the complete liberation of androgyny in the efforts of human modernization. Perhaps, this is the right way for the development of Chinese feminist literary theory: to recognize the current embarrassing situation, but not to lose confidence in the future, and to make down-to-earth efforts based on the existing unsatisfactory situation.

5. Conclusions

Feminism is based on natural human rights, with equal rights between men and women or equality between men and women as its core idea. It has guided practice-feminist movement, and it has been tested, criticized, and developed in practice and has finally become a worldwide political, social, and cultural trend of thought. This study adds community attribute information to nodes, which can obtain the community information of nodes while visiting nodes and avoid repeatedly traversing the whole network. Experiments with two different datasets have achieved better feature extraction results. On this basis, the user portrait is analyzed. It is verified that the execution time of QLFM_LP algorithm is reduced by half compared with LFM algorithm under the same network dataset. The theoretical construction of feminist literature needs to deal with multiple opponents from history and reality, constantly answer various questions raised by the cultural system, constantly learn from the excellent achievements of brand-new disciplines, and critically absorb them. This is the secret of keeping the vitality of feminist theory (including literary theory) forever.

Data Availability

The data used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare that they have no conflict of interest.

Acknowledgments

This work was supported by Fujian Social Science Planning Project: Research on the Ability Development of Fujian Private Entrepreneurs in the New Era (FJ2018B058).