Mobile Information Systems

Research Article

Design and Implementation of an Improved K-Means Clustering Algorithm

The idea of the improved K-mean algorithm is as follows:

(1)	Elbow rule to get K data points
(2)	Use minimum variance optimization to select k cluster centers C₁(0),…, C_k(0)
	(The number of iterations t = 0, 1, 2, ..., until the objective function converges)
(3)	repeat
(4)	Calculate the distance between k centroids, and use the hash table to save the shortest distance from each centroid to other centroids (Use d(C_i, C_j) to represent the distance between the centroid C_i and its nearest centroid C_j).
(5)	repeat
	for each data point x.
	if (data point x has been assigned to the cluster where the centroid C_i is located).
	if 2d(C_i, x) ≤ d(C_i, C_j).
	x allocation does not need to be changed;
	else.
	Continue to calculate the distance from x to the existing k centroids and assign it to the cluster with the closest centroid.
	end if.
	else (data point x is not assigned to any cluster).
	for i from 0 to K do.
	if 2d(C_i, x) ≤ d(C_i, C_j).
	assign x to the cluster where C_i is located.
	Exit the for loop.
	end if.
	end if.
(6)	Recalculate the centroid.

(7)	Until all centroids no longer move and the sum of squared error SSE converges.
(8)	Output cluster center and sample set of k clusters {z₁, …, z_k}