Mobile Information Systems

Research Article

Off-Topic Detection of Business English Essay Based on Deep Learning Model

Algorithm 1

Off-topic detection algorithm for business English composition based on deep learning.

	Input: topic set G and article set O. The K value of the class cluster is set to 2
	Output: segmented clusters of articles.
(a)	Preprocessing the original text, including word segmentation, removing punctuation, removing blank characters, removing stop words, etc.
(b)	Enter the short text and the corresponding article respectively. Word vector is obtained by using the trained word2vec model.
(c)	LDA topic model is used to obtain the subject words of the topic and the article respectively. The topic selects the topic probability top-5 topic words. The article selects the topic probability top-15 topic words. A subject-word matrix is obtained by combining the subject word and word vector.
(d)	As the input of CNN, the topic-word matrix will be used to calculate the similarity between test articles and article set vectors.
(e)	When the similarity is greater than , the essay to be tested is considered to meet the requirements of the question. No correlation calculation is required.
(f)	When the relative similarity is greater than , the coupling spatial model is used to calculate the correlation. In step (e), all the relevant degrees of keywords selected were set to 1, and the pan-semantic matrix was obtained. The rows of the matrix represent the article subject words. Columns represent subject headings.
(g)	Take the maximum vector value of each column as an element in the vector representation of this article, and get a new vector.
(h)	Cycle the above 7 steps until all articles to be tested are represented as distributed vectors.
(i)	Input article vector and output article cluster by k-means clustering method.