Research Article

Design and Implementation of an Improved K-Means Clustering Algorithm

Algorithm 1

The idea of the improved K-mean algorithm is as follows:
(1)Elbow rule to get K data points
(2)Use minimum variance optimization to select k cluster centers C1(0),…, Ck(0)
(The number of iterations t = 0, 1, 2, ..., until the objective function converges)
(3)repeat
(4)Calculate the distance between k centroids, and use the hash table to save the shortest distance from each centroid to other centroids (Use d(Ci, Cj) to represent the distance between the centroid Ci and its nearest centroid Cj).
(5)repeat
 for each data point x.
  if (data point x has been assigned to the cluster where the centroid Ci is located).
   if 2d(Ci, x) ≤ d(Ci, Cj).
    x allocation does not need to be changed;
   else.
  Continue to calculate the distance from x to the existing k centroids and assign it to the cluster with the closest centroid.
   end if.
  else (data point x is not assigned to any cluster).
  for i from 0 to K do.
    if 2d(Ci, x) ≤ d(Ci, Cj).
     assign x to the cluster where Ci is located.
     Exit the for loop.
    end if.
   end if.
(6)Recalculate the centroid.
(7)Until all centroids no longer move and the sum of squared error SSE converges.
(8)Output cluster center and sample set of k clusters {z1, …, zk}