High-Dimensional Text Clustering by Dimensionality Reduction and Improved Density Peak
Algorithm 1:
The DPC-K-means.
Input: text feature vector , is the minimum size of the target dimension.
Output: the clustering results.
Begin:
Step1: determine whether is greater than calculated according to Formula (3). If is greater than , use the SRP dimension reduction framework in Step2. If is less than or equal to , random projection is used in Step2.
Step2: the SRP dimension reduction framework is used to reduce the dimensionality of layer by layer, until matrix after dimension reduction is obtained. Or directly use random projection to reduce the dimension to get the matrix .
Step3: Calculate the value and value of according to Equations (6) and (7) and plot the decision graph with and axes.
Step4: calculate the value according to Equation (10) to verify the clustering centers and the number of clusters.
Step5: perform K-means clustering: the clustering centers obtained in Step4 are used as the initial cluster centers, and the number of clusters is used as the value for K-means clustering.