A Clustering-based Method for Business Hall Efficiency Analysis

Huang, Tianlin; Wang, Ning

doi:https://doi.org/10.1155/2021/7622576

Scientific Programming

On this page

Abstract Introduction Preliminaries Experimental Results Conclusion Data Availability Conflicts of Interest References Copyright Related Articles

Special Issue

Theory, Algorithms, and Applications for the Multiclass Classification Problem

View this Special Issue

Research Article | Open Access

Volume 2021 | Article ID 7622576 | https://doi.org/10.1155/2021/7622576

A Clustering-based Method for Business Hall Efficiency Analysis

Tianlin Huang¹and Ning Wang¹

Academic Editor: Antonio J. Peña

Received20 May 2021

Revised11 Aug 2021

Accepted07 Sept 2021

Published01 Oct 2021

Abstract

Excessive or insufficient business hall resources may result in unreasonable resource allocation, adversely affecting the value of an entity business hall. Therefore, proper characteristic parameters are the key factors for analyzing the business hall, which strongly affect the final analysis results. In this study, a characteristic analysis method for the economic operation of a business hall is developed and the feature engineering is established. Because of its simplicity and versatility, the -means algorithm has been widely used since it was first proposed around 50 years ago. However, the classical -means algorithm has poor stability and accuracy. In particular, it is difficult to achieve a suitable balance between of the centroid initialization and the clustering number . We propose a new initialization (LSH--means) algorithm for -means clustering. This algorithms is mainly based on locality-sensitive hashing (LSH) as an index for computing the initial cluster centroids, and it reduces the range of the clustering number. Furthermore, an empirical study is conducted. According to the load intensity and time change of the business hall, an index system reflecting the optimization analysis of the business hall is established, and the LSH--means algorithm is used to analyze the economic operation of the business hall. The results of the empirical study show that the LSH--means that the clustering method outperforms the direct prediction method, provides expected analysis results as well as decision optimization recommendations for the business hall, and serves as a basis for the optimal layout of the business hall.

1. Introduction

An entity business hall is where a company directly conducts specific business activities, such as commodity trade, business handling, and service. However, owing to rapid urbanization and economic development, unreasonable resource allocation is becoming increasingly prevalent. For example, the number of entity business halls is excessive in some places and insufficient in others. Hence, the deployment of new commercial outlets (halls) or resource allocation optimization for existing retail outlets often needs to be performed manually. Therefore, how to evaluate the efficiency of business halls has emerged as a major concern for many enterprises.

To this end, many researchers have attempted to overcome the disadvantages of human judgment, which is highly subjective. Brandeau and Chiu [1] considered the transportation cost and the distance between the warehouse and the customer and used a gradient-like algorithm to study the location issue. Wang et al. [2] used nearest-neighbor clustering and the function of Ripley [3] to analyze the layout of commercial outlets and suggested that business type, land price, and traffic accessibility are the critical factors. Gerard [4] analyzed the service needs and waiting demand of customers for bank halls and attempted to shorten the perceived waiting time of customers on the basis of the customers’ business types. Thus, customer satisfaction was improved. Anderson et al. [5] used the queuing model to optimize the queuing service system of banks. They determined the optimal number of service windows by acquiring and presenting a large amount of data. Lin et al. [6] studied the relationship between retail stores and street centrality and pointed out that besides the transport network, which has a strong impact on the retailer’s location, the street centrality influences the type of retail store. Kang [7] analyzed the changes in warehouses from central urban areas to the urban periphery over time and studied the main factors affecting the warehouse location. Hui [8] used data mining to establish the channel analysis model for an electricity business hall and optimized the resource allocation. Based on the statistics of customer queuing time, business processing time, customer satisfaction, and so on, Yan et al. [9] established an intelligent access platform for the business data and improved the service efficiency. However, there is no unified standard for the business hall index system.

Clustering is a key technique in data mining, and its applications include pattern recognition [10, 11], image processing [12], and recommendation [13]. Clustering aims to partition data into different categories based on a measure of similarity. The -means algorithm is widely used owing to its simplicity and effectiveness. However, the different settings of the parameters and random selection of the initial clustering centers make the classical -means algorithm unstable.

The classical clustering algorithm involves two problems: the first problem is to classify a given dataset on the basis of the prespecified cluster number ; hence, the problem of determining the “correct cluster number” has attracted considerable interest. Although several methods have been developed for estimating the number of data clusters [14–17], it is difficult to use them in practical applications. Therefore, determining the correct number of clusters has long been an important research topic in cluster analysis. The second problem is to determine the initial clustering center, which has a significant impact on the clustering effect. Studies conducted thus far have explored several initialization methods for the -means algorithm. For example, the -means++ algorithm [18] has been proposed to avoid this issue. This algorithm randomly selects the first centroid, and the other centroids are selected as far away as possible from the first centroid. However, random selection is still widely used in practice [19]. Erisoglu et al. [20] proposed an incremental approach for computing the initial clustering centers. In this approach, the reduced dataset is partitioned until the number of clusters equals the predefined number of clusters. However, the number of clusters must be known in advance. The compressed -means (CKM) algorithm [21] is initialized by locality-sensitive hashing (LSH) [22], and the distance is calculated using the Hamming distance between binary codes. The LSH link [23] can rapidly find a nearby cluster to be connected through the LSH algorithm. David et al. [24] proposed a new LSH scheme adapted to the distance for approximate nearest neighbors (ANN) search in high-dimensional spaces.

In summary, there is no unified standard for the index system of business halls at present. Therefore, we establish an index system for analyzing the efficiency of a business hall. To address the problem of -means initialization sensitivity as well as the difficulty in determining the number of clusters, we initialize the -means centroid on the basis of LSH. Accordingly, we implement the relevant algorithms and present the optimal allocation scheme for the business hall.

The main contributions of this study are as follows:(1)According to the average waiting time, ticketing time, and business type of a business hall, we analyze the average load rate of the business hall and use the relevant characteristic variables to describe the load of the business hall. Finally, we propose a general business hall index system.(2)By combining the characteristics of -means and LSH, We propose a new initialization (LSH--means) algorithm for -means clustering. The model can get the load classification of each business hall by inputting the relevant index variables for the optimization of business hall distribution.(3)The results of our empirical analysis verify the validity of the proposed LSH--means approach. Thus, LSH--means can be efficiently used for the operational analysis of a business hall.

The remainder of this paper is organized as follows: Section 2 introduces the required preliminaries, definitions, and models. Section 3 describes the proposed initialization methodology. Section 4 presents, compares, and discusses the experimental results. Finally, Section 5 concludes the paper.

2. Preliminaries

2.1. -means Algorithm

The notations used in this paper are defined in Table 1. The -means [25] method is the most well-known clustering method because of its simplicity. It has been identified as one of the top 10 algorithms in data mining [26]. Given a dataset , -means aims to partition it into different clusters , where is a predefined number. The objective of the -means clustering algorithm is to minimize the sum of squared errors (SSE) [27] over all clusters. The SSE is defined as follows:where denotes the -th cluster centroid, which is computed as the mean of points in , and is the data object in the -th cluster.where denotes the number of data points in the -th cluster.

To solve equation (1), an expectation–maximization (EM)-like optimization method is adopted by updating or and simultaneously fixing the other [28]. In general, the clustering procedure involves three steps: (1) initialize cluster centroids; (2) assign each sample to its closest centroid; and (3) recompute the cluster centroids with the assignments produced in Step 2 and go back to Step 2 until convergence. This is known as the Lloyd iteration procedure [29]. Such an iterative optimization approach has several drawbacks. First, it is sensitive to the initialization, which may lead to an inferior result for a given poorly initialized . Many methods have been proposed to obtain a stable solution, including the -means++ algorithm [18]. Second, finding the optimal solution to -means is an NP-hard problem. Some variants of -means have been proposed, such as various parametric -means, including fuzzy -means [30, 31]. Third, -means cannot handle new data, which requires the entire dataset to be observed. The complexity is , where , , , and denote the number of iterations, size of the dataset, number of clusters, and dimensionality, respectively. This complexity is considerably higher than that of other well-known clustering algorithms such as DBSCAN [32] and mean shift [33].

2.2. LSH

LSH is a well-known solution for the approximate nearest neighbor problem in high-dimensional spaces. LSH was first introduced for the Hamming metric by Indyk and Motwani [34]. Data points are assigned to individual hash buckets in each hash function. The idea of LSH is that closer data points are mapped to the same hash bucket with high probability. LSH has been shown to be effective even for high-dimensional data, both theoretically and experimentally [35]. are a family of hash functions. Each hash function must satisfy the LSH property: , where is the similarity between and . These hash functions must meet the following two conditions:(1)If , then (2)If , then

where represents the distance measure between and , , and . The definition implies that and are hashed into the same bucket in the projection with a very high probability . Regardless of whether they are close to each other, they will be hashed into the same bucket with a low probability. A -sensitive family of hash functions is useful when the collision probabilities , satisfy . Figure 1 shows an example of hashing key space.

3. Proposed LSH-Based Initialization Algorithm

The proposed framework involves three steps: (1) an index system for the efficiency analysis of a business hall is established in Section 3.1. (2) To overcome the problems of poor stability and low accuracy of the classical algorithm, a boost -means algorithm based on LSH initialization is proposed in Section 3.2. (3) The -means algorithm is implemented to obtain the clustering results. The details of these three steps are illustrated in Figure 2.

3.1. Establishment of the Index System

Through the load analysis of the business hall, we can determine the high and low loads and optimize the business hall. The average utilization rate of each business hall is analyzed according to multiple indicators (including average waiting time, ticketing time for business, and business type). Thus, we can use the relevant characteristic variables to describe the load of the business hall. By applying the clustering algorithms, we can obtain the load categories of different business halls, which provides a basis for planning the locations of the business halls. First, the following two essential features are extracted: the maximum load of the business hall (M) and the ratio of the actual daily load to the maximum load .(1)The maximum load of the business hall is given by where is the proportion of a specific business, is the average time for the clerk to handle the business, is the number of business types, is the working time of the clerk, and is the number of clerks in the business hall. The variables are taken from the peak period. This value represents the maximum business volume that a business hall can withstand during the peak period. The peak period can be obtained by measuring the historical data of each business hall.(2)The ratio of the actual daily load to the maximum load is given by where represents the actual daily load of the business hall and is the maximum load in one day.

By combining the essential characteristics of the business hall and based on the analysis of historical data, we can obtain the calculation indicators of the business hall to prepare for the subsequent model input. Therefore, the feature engineering for the business hall efficiency analysis is established, and the critical indexes extracted are as follows:(1)The ratio of the average load to the maximum load is given by where is the number of days, and and are the same as above. This index denotes the ratio of the average actual load to the maximum load over some time.(2)The actual load trend is given by where denotes the actual load curve fitting, is a constant, is the regression coefficient, is a time-independent variable, and is the number of statistical data. This index indicates whether the load trend of the business hall will be rising, flat, or declining for some time. A fitting curve can be used to characterize the load trend of the business hall over some time, and the slope represents the trend state. Our method includes a commercial center, residential center, new urban area, and other factors.(3)The proportion of high-value business is given by where is the high-value business volume and is the total business volume. Thus, this index denotes the proportion of high-value business to total business in the peak period.(4)The high-frequency load is given by where is a high threshold and represents the load of exceeding within a period. Furthermore, can be obtained by statistical analysis of the historical data of the business hall.(5)The low-frequency load is given by where is a low threshold and denotes the frequency that is less than for some time. Furthermore, can be obtained by statistical analysis the of historical data of the business hall.(6)The latest high-load interval is given by where represents the current time, denotes a high threshold, and refers to the time when the latest is greater than . Furthermore, denotes the interval from to .(7)The latest low-load interval is given by where represents the current time, denotes a low threshold, and refers to the time when the latest is greater than . Furthermore, denotes the interval from to .

3.2. LSH--Means

The main purpose of clustering is to divide data into clusters in which objects in the same cluster are close to one another, whereas objects in different clusters are far from one another. Two factors affect the quality of -means clustering. Before applying the algorithm, we need to specify the number of clusters and select the initial cluster centroid. Selecting an appropriate initial cluster centroid can improve the quality of clustering. To this end, a critical study was conducted by Vassilvitskii et al. [18, 36]. If the initial cluster centroid is selected carefully, the -means algorithm converges to a better local optimal solution. Furthermore, careful selection of the initial cluster centroid makes the -means iteration converge faster [18]. However, to make the initial centroid adapt to the data distribution, it is necessary to scan rounds. Therefore, although the number of scanning wheels in [36] has been reduced to a small value, the additional computing cost is still inevitable. Our algorithm exploits LSH. The algorithm minimizes the path by adding the nearest neighbor, and LSH can effectively search for the nearest group features in the path. The average time complexity of the hash-based search is . LSH scans the data records and finds the nearest points; the average values are computed after the nearest points are classified as a category. Algorithm 1 describes the process of obtaining the initialization centroids in our proposed LSH--means scheme. The main steps are as follows:(1)Suppose that we have a set of points via the index system in Section 3.1. We use LSH to index the feature vectors extracted from the dataset to reduce the search time for the nearest neighbor of each query. This is based on the hash mapping function, hash functions, and hash table [37]. Constructing an effective LSH index structure for approximate nearest neighbor search depends on the number of hash tables and the number of bits of the hash codes.(2)To facilitate the statistics of nonclustered data points, in Algorithm 1, we copy a dataset from . Randomly select one data point from as the centroid. Then, is merged into the set and removed from the dataset , where is the -th cluster. After obtaining points, query the corresponding bucket number according to the hash table in Step 1 and take out the data in bucket number . Calculate the similarity or distance between and the data points in the bucket and return the nearest neighbor data .(3)Take data point from , whose distance to does not exceed . Put merged into , that is, , and remove it from the dataset .(4)Repeat Step 3 until the other data point in reaches a certain threshold; the threshold can be computed as follows:(5)Repeat Steps 2-3 until the length of the dataset is less than the threshold . As shown in Algorithm 1, .(6)The arithmetic mean values for the final k sets of samples are computed; then, we can obtain the clustering centers for all the categories in this way: Therefore, based on the aforementioned steps, we will have two algorithms to choose from: “best” movement and “fast” movement [38]. For the “best” movement, we can use equation (13) and the value in Algorithm 1 as the initial clustering center of the classical -means input , and run the algorithm; the result is the final result. For “fast” movement, the divided categories can be regarded as approximate clustering results and directly used as the classification results. Because the initial clustering center is determined and the initial category is obtained, the result of the algorithm is more stable and accurate, and it requires a relatively short running time.

Required: Training dataset ; the size of dataset ; the minimum number of data points in one cluster; the maximum distance in one cluster ; the closest set , where and is a two-dimensional array; is the number of clusters.
Output: The final clustering result.
(1)	Initialize LSH.
(2)	Index dataset via LSH.
(3)	Let .
(4)	Let .//copy
(5)	whiledo
(6)	.
(7)	Randomly select one point from dataset .
(8)	.
(9)	is removed from .
(10)	= query .
(11)	Let
(12)	fordo
(13)	.
(14)	ifthen
(15)	break;
(16)	end if
(17)	ifthen
(18)	removed from .
(19)	.
(20)	end if
(21)	end for
(22)	end while
(23)	Compute the centeroids via .
(24)	Function
(25)	The final clustering result. .

4. Experimental Results

First, we use the UCI https://archive-beta.ics.uci.edu/ml/datasets datasets [39] to verify the performance of the proposed algorithm, and we state the verification criteria. In addition, we use the Mall-Customers dataset https://www.kaggle.com/shwetabh123/mall-customers for the value range of the number of clusters of the proposed LSH--means model. Our experimental results demonstrate the effectiveness and superiority of the proposed LSH--means. Then, we compare it with the actual business hall dataset and present an example to optimize the business hall operation.

4.1. Experimental Design

To verify the aforementioned points and evaluate the effectiveness of the proposed LSH--means model, numerous experiments were conducted on the UCI datasets, which consist of Balance, Wine, Breast, Diabet, Iris, Hayes-roth, Tic-tac-toe, and Bupa. We followed the experiments conducted in a previous study [40]. We briefly review the existing baselines as follows:(1)-means [25] is derived from the classical -means.(2)Enhanced -means [38] enhances the classical -means algorithm. The initial cluster centers are determined in advance instead of random selection.(3)The AC algorithm [41] for clustering can assume each sample as a pattern; by computing the similarity between patterns, the more similar patterns are grouped into one class, and the less similar patterns are classified into different classes. The difference between two patterns in AC clustering is usually measured by the distance function, including the Euclidean distance or Hamming distance. In the experiment, the AC algorithm is implemented by the KnowledgeMiner Software [41].

There are 200 samples in the Mall-Customers dataset. It includes gender, customer ID, age, annual income, and expenditure scores. In addition, it collects insights from the data and groups them according to their behaviors. The elbow method [42] is a well-known method for determining the optimal value of . As shown in Figure 3(a), the optimum number of clusters of the Mall-Customers dataset is 5. According to Algorithm 1, we set the minimum number and the maximum distance . Owing to the small amount of data, we set the number of buckets to 1. After 10 LSH-based initializations, we get the value of between . Figure 3(b) shows the results of LSH -means clustering. The black dots represent the centroids.

(a)

(b)

There are 525 samples in the Balance dataset. For the classical -means algorithm, the number of clustering categories that match the real categories is 271, and the matching rate is 51.62%. The corresponding values of the LSH -means algorithm are 288 and 54.87%, respectively. Similarly, the results of the other UCI datasets are listed in Table 2. To determine whether there are significant differences between algorithms, we use the Wilcoxon signed-rank test [43]. It is a nonparametric statistical test. The Wilcoxon test has been widely used in many fields, especially in algorithm comparison and analysis [40]. It is expressed as follows:where is the difference in clustering performance between the two algorithms on the -th dataset, and the absolute values of their difference are arranged in the ascending order. If the rank is the same, we take the average value. implies that the sum of ranks for the algorithm is better than the other, and implies the opposite.

The calculations for the eight aforementioned datasets are presented below.

Let ; we get . According to the critical value table of the Wilcoxon test, we can judge that the difference between algorithms is significant under the condition . Furthermore, as shown in Table 2, there are five datasets for the LSH-based -means, which is hence better than the enhanced -means; thus, in terms of quantity, the LSH-based -means algorithm outperforms the enhanced -means algorithm. Therefore, we can judge that the efficiency of LSH-based -means is significant.

In addition to comparison with actual categories, we further distinguish the clustering effects of -means clustering and the AC algorithm. A tight and separative indicator is used to evaluate the clustering results [44], which is defined as follows:where , , and denote the cluster centers, is any sample in the dataset, is the number of clusters, and is the sample set. The Xie–Beni (XB) index [45, 46] is based on intracluster and intercluster distances; it is formulated in terms of the cluster compactness and separation between the clusters. We use the XB index for the evaluation of the cluster effects, and it is defined as follows:where is the ratio of the average distance between data objects and their corresponding clustering centers to the minimum distance of the cluster centers. The smaller the value of , the higher is the clustering quality. The results are summarized in Table 3.

From the XB value calculated in Table 3, we can conclude that the difference between the algorithms is significant. The XB value of the AC algorithm is the largest, while that of the LSH-based -means algorithm is the smallest, which implies that the LSH -means algorithm outperforms the other algorithms in the experiment. Thus, the experimental results verify the effectiveness and superiority of the proposed method. Therefore, it can finally be applied to the empirical analysis. In the next section, we describe the application of LSH -means to business hall analysis.

4.2. Business Hall Analysis

In reality, business hall resource allocation may be unreasonable. For example, some business halls may be busy, while others may be idle. This may be caused by overlapping user coverage in different business halls, unreasonable location of the business halls, and a large proportion of low-value businesses. In this section, we experimentally verify the effectiveness of our index system and analyze the results of the proposed LSH--means model.

4.2.1. Business Hall Clustering

When the index system is established as described in Section 3.1, we get the characteristic information of the business hall. After data preprocessing, the number of clusters is determined subjectively. Consider the load intensity and time change information for the business hall. The load intensity can be categorized into , , and , and the load trend can be categorized into , , . Thus, a nine-square grid (Figure 4(b)) map can be obtained. At the same time, by referring to the knowledge of field experts, the number of clusters can be defined as 9 for the subjective clustering methods. After the clusters are determined, the extracted feature indicators can be taken as the input, and the clustering model is implemented. For the LSH -means algorithm, the distance parameter was selected as the Euclidean distance, the maximum number of iterations was set to 500, the number of seeds was set to 10, and the number of the clusters was set to 9. Then, the outcomes were obtained, as shown in Table 4 and Figure 4(a).

(a)

(b)

Meanwhile, in the case of different predetermined cluster numbers for the subjective clustering methods, the AC algorithm determined the clustering number automatically, which was computed on the basis of the similarity between the samples. Here, the similarity was set at 95%, and the algorithm was implemented at the same time. Thus, the result was exactly consistent with that of the LSH -means algorithm. The details are presented in Table 5. For example, the first and sixth samples, the second sample, and the third sample were clustered into the same category.

The final classification results obtained from the model can provide the load grades and decision-making suggestions, which can serve as a basis for site planning optimization of the business halls. In addition, the results of the two algorithms were consistent, which indicate that the LSH -means algorithm is effective and a stable result was obtained. Accordingly, further optimization action can be implemented.

4.2.2. Optimization Analysis

As shown in Figure 5(a), the 16^th and 17^th business halls are both in Class . This category indicates that the current load is , and the load trend remains unchanged, implying that the business hall resources are redundant in this area. The business halls in this class are idle, and the site may be unreasonable. In addition, the merger of business halls, relocation, and reduction of resource input in this area should be considered.

(a)

(b)

By contrast, for Class (the red part of Figure 5(b)), it can be seen that the current load and load trend are both , which implies that the business volumes of the business halls are large, and the load trend change is still on the rise. Currently, the first and sixth business halls belong to this category. The future trend is still likely to be growing, and the business volumes will keep increasing. Therefore, this area is where more business hall resources need to be input, and the optimal site planning of business halls should be considered accordingly. We can define the objective function of the optimal site for the input business hall resources as follows:where is the quantitative value of factors that affect the rationality of business hall location, is the number of factors, is the distance function, is the weight, and is the target point to be solved. Thus, according to the objective function and relevant coordinate information of key units in the area, we can compute the optimal planning location of the business hall using the optimization algorithm. Here, the optimal location of the business hall was computed as , and the optimal solution was 527.0368. The details are shown in Figure 5.

The 9^th and 12^th business halls belong to the Class , which shows that the current load and load trends are both normal, and the status is stable. Therefore, the business halls in this class are not the current focus of optimization. In addition, the other classes are similar to this category, which is also not the current focus. The main objects are Class and Class , that is, excessive or insufficient business hall resources are mainly concentrated in these two classes, which are the focus of our optimization analysis.

5. Conclusion

Excessive or insufficient business hall resources may result in unreasonable resource allocation, which adversely affects the value of an entity business hall. Therefore, proper characteristic parameters are the key factors for analyzing the business hall, which strongly affect the final analysis results. According to the time change and load trend, multiple variables such as average load rate, actual load trend, and high-frequency load are extracted as the characteristic indexes of the business hall. In this study, a characteristic analysis method for the economic operation of a business hall was developed, and the specific calculation process was presented; accordingly, the feature engineering was established. Moreover, based on the load intensity and time change information of business halls, we built an index system and performed further optimization analysis. The key characteristic indicators extracted were the average waiting time, ticket handling time, and business type, and a model for evaluating business hall efficiency was established. The model obtained the load grading of each business hall by the relevant variable input, which provided a basis for optimal site planning of the business halls.

An empirical study showed that the LSH--means clustering method outperforms the direct prediction method, provides expected analysis results and decision optimization suggestions for business halls, and serves as a basis for the optimal layout of business halls. In addition, by considering the load intensity and time change information, the cluster number was determined according to the characteristic analysis results, with a certain theoretical and practical significance. In the future, we will explore and develop a general method to automatically determine the parameters and use it in practical applications.

Data Availability

The data used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

References

M. L. Brandeau and S. S. Chiu, “An overview of representative problems in location research,” Management Science, vol. 35, no. 6, pp. 645–674, 1989.
View at: Publisher Site | Google Scholar
W. Shijun, H. Feilong, and J. Lili, “Locations and their determinants of large-scale commercial sites in changchun,” China, vol. 70, no. 6, pp. 893–905, 2015.
View at: Google Scholar
F. Goreaud and R. Pélissier, “On explicit formulas of edge effect correction for Ripley’s K ‐function,” Journal of Vegetation Science, vol. 10, no. 3, pp. 433–438, 1999.
View at: Publisher Site | Google Scholar
J. W. Cohen and O. J. Boxma, Boundary Value Problems in Queueing System Analysis, Elsevier, Amsterdam, Netherlands, 2000.
D. R. Anderson, D. J. Sweeney, T. A. Williams, J. D. Camm, and J. J. Cochran, An Introduction to Management Science: Quantitative Approach, Cengage learning, Boston, MA, USA, 2018.
G. Lin, X. Chen, and Y. Liang, “The location of retail stores and street centrality in guangzhou, China,” Applied Geography, vol. 100, pp. 12–20, 2018.
View at: Publisher Site | Google Scholar
S. Kang, “Warehouse location choice: a case study in los angeles, ca,” Journal of Transport Geography, vol. 88, Article ID 102297, 2020.
View at: Publisher Site | Google Scholar
X. Hui, “Optimization model and algorithm research of business hall service channel power,” Electronic Test, vol. 1, pp. 20-21, 2014.
View at: Google Scholar
X. T. Yan, Y. Zhang, Y. J. Huang, and W. U. Ying-Chun, “Management application and service data integration of the electricity supply business hall,” Power Demand Side Management, vol. 37, pp. 50–52, 2017.
View at: Google Scholar
B. Baraldi, “A survey of fuzzy clustering algorithms for pattern recognition,” IEEE Transactions on Systems, Man, and Cybernetics. Part B, Cybernetics, vol. 29, 1999.
View at: Publisher Site | Google Scholar
J. Lu, W. Gang, W. Deng, and K. Jia, “Reconstruction-based metric learning for unconstrained face verification,” IEEE Transactions on Information Forensics and Security, vol. 10, no. 1, pp. 79–89, 2014.
View at: Google Scholar
M. Yambal and H. Gupta, “Image segmentation using fuzzy c means clustering: a survey,” in Proceedings of the 2010 6th International Conference on Emerging Technologies (ICET), Islamabad, Pakistan, October 2010.
View at: Publisher Site | Google Scholar
H. Zhang, T. W. Chow, and Q. M. Wu, “Organizing books and authors by multilayer som,” IEEE Transactions on Neural Networks & Learning Systems, vol. 27, no. 12, p. 2537, 2015.
View at: Google Scholar
R. C. De Amorim and C. Hennig, “Recovering the number of clusters in data sets with noise features using feature rescaling factors,” Information Sciences, vol. 324, pp. 126–145, 2015.
View at: Publisher Site | Google Scholar
C.-W. Tsai, W.-L. Chen, and M.-C. Chiang, “A modified multiobjective ea-based clustering algorithm with automatic determination of the number of clusters,” in Proceedings of the 2012 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 2833–2838, IEEE, Seoul, Korea, October 2012.
View at: Publisher Site | Google Scholar
C. Hennig and T. F. Liao, “How to find an appropriate clustering for mixed-type variables with application to socio-economic stratification,” Journal of the Royal Statistical Society: Series C (Applied Statistics), vol. 62, no. 3, pp. 309–369, 2013.
View at: Publisher Site | Google Scholar
W. Fu and P. O. Perry, “Estimating the number of clusters using cross-validation,” Journal of Computational & Graphical Statistics, vol. 29, no. 1, pp. 162–173, 2020.
View at: Publisher Site | Google Scholar
S. Vassilvitskii and D. Arthur, “K-means++: the advantages of careful seeding,” in Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1027–1035, New Orleans LA, USA, January 2006.
View at: Google Scholar
M. A. Masud, J. Z. Huang, C. Wei et al., “I-nice: a new approach for identifying the number of clusters and initial cluster centres,” Information Sciences, vol. 466, pp. 129–151, 2018.
View at: Publisher Site | Google Scholar
M. Erisoglu, N. Calis, and S. Sakallioglu, “A new algorithm for initial cluster centers in k-means algorithm,” Pattern Recognition Letters, vol. 32, no. 14, pp. 1701–1705, 2011.
View at: Publisher Site | Google Scholar
X. Shen, W. Liu, I. Tsang, F. Shen, and Q.-S. Sun, “Compressed k-means for large-scale clustering,” in Proceedings of the 31st AAAI Conference on Artificial Intelligence, San Francisco CA USA, February 2017.
View at: Google Scholar
A. Andoni and P. Indyk, “Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions,” in Proceedings of the 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS’06), pp. 459–468, IEEE, Berkeley, CA, USA, October 2006.
View at: Publisher Site | Google Scholar
H. Koga, T. Ishibashi, and T. Watanabe, “Fast agglomerative hierarchical clustering algorithm using locality-sensitive hashing,” Knowledge and Information Systems, vol. 12, no. 1, pp. 25–53, 2007.
View at: Publisher Site | Google Scholar
D. Gorisse, M. Cord, and F. Precioso, “Locality-sensitive hashing for chi2 distance,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 2, pp. 402–409, 2011.
View at: Google Scholar
J. MacQueen, “Some methods for classification and analysis of multivariate observations,” in Proceedings of the 5th Berkeley symposium on mathematical statistics and probability, vol. 1, pp. 281–297, Oakland, CA, USA, July 1967.
View at: Google Scholar
X. Wu, V. Kumar, J. Ross Quinlan et al., “Top 10 algorithms in data mining,” Knowledge and Information Systems, vol. 14, no. 1, pp. 1–37, 2008.
View at: Publisher Site | Google Scholar
A. K. Jain, “Data clustering: 50 years beyond k-means,” Pattern Recognition Letters, vol. 31, no. 8, pp. 651–666, 2010.
View at: Publisher Site | Google Scholar
X. Peng, I. W. Tsang, J. T. Zhou, and H. Zhu, “K-meansnet: when k-means meets differentiable programming,” https://arxiv.org/abs/1808.07292.
View at: Google Scholar
S. Lloyd, “Least squares quantization in pcm,” IEEE Transactions on Information Theory, vol. 28, no. 2, pp. 129–137, 1982.
View at: Publisher Site | Google Scholar
J. C. Dunn, “A fuzzy relative of the isodata process and its use in detecting compact well-separated clusters,” Journal of Cybernetics, vol. 3.
View at: Publisher Site | Google Scholar
J. C. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithms, Springer Science & Business Media, Berlin, Germany, 2013.
M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, “A density-based algorithm for discovering clusters in large spatial databases with noise,” Kdd, vol. 96, pp. 226–231, 1996.
View at: Google Scholar
D. Comaniciu and P. Meer, “Mean shift: a robust approach toward feature space analysis,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 5, pp. 603–619, 2002.
View at: Publisher Site | Google Scholar
P. Indyk and R. Motwani, “Approximate nearest neighbors: towards removing the curse of dimensionality,” in Proceedings of the 13th Annual ACM Symposium on Theory of Computing, pp. 604–613, Dallas TX USA, May 1998.
View at: Google Scholar
A. Gionis, P. Indyk, and R. Motwani, “Similarity search in high dimensions via hashing,” Vldb, vol. 99, pp. 518–529, 1999.
View at: Google Scholar
B. Bahmani, B. Moseley, A. Vattani, R. Kumar, and S. Vassilvitskii, “Scalable k-means++,” https://arxiv.org/abs/1203.6402.
View at: Google Scholar
W. Hu, Y. Fan, J. Xing, L. Sun, Z. Cai, and S. Maybank, “Deep constrained siamese hash coding network and load-balanced locality-sensitive hashing for near duplicate image detection,” IEEE Transactions on Image Processing, vol. 27, no. 9, pp. 4452–4464, 2018.
View at: Publisher Site | Google Scholar
J. Chen, D. Zhang, and Y. Nanehkaran, “Research of power load prediction based on boost clustering,” Soft Computing, vol. 25, no. 8, pp. 6401–6413, 2021.
View at: Publisher Site | Google Scholar
D. J. Newman, “Uci repository of machine learning database,” http://www.ics.uci.edu/mlearn/MLRepository.html.
View at: Google Scholar
J. Chen, D. Zhang, and Y. A. Nanehkaran, “An economic operation analysis method of transformer based on clustering,” IEEE Access, vol. 7, pp. 127956–127966, 2019.
View at: Publisher Site | Google Scholar
F. Lemke and J.-A. Müller, “Self-organising data mining,” Systems Analysis Modelling Simulation, vol. 43, no. 2, pp. 231–240, 2003.
View at: Publisher Site | Google Scholar
R. L. Thorndike, “Who belongs in the family?” Psychometrika, vol. 18, no. 4, pp. 267–276, 1953.
View at: Publisher Site | Google Scholar
L. Deng, J. Pei, J. Ma, and D. L. Lee, “A rank sum test method for informative gene discovery,” in Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 410–419, Seattle, WA, USA, August 2004.
View at: Publisher Site | Google Scholar
M. Halkidi, Y. Batistakis, and M. Vazirgiannis, “On clustering validation techniques,” Journal of Intelligent Information Systems, vol. 17, no. 2, pp. 107–145, 2001.
View at: Publisher Site | Google Scholar
X. L. Xie and G. Beni, “A validity measure for fuzzy clustering,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 13, no. 8, pp. 841–847, 1991.
View at: Publisher Site | Google Scholar
M. Singh, R. Bhattacharjee, N. Sharma, and A. Verma, “An improved xie-beni index for cluster validity measure,” in Proceedings of the 2017 Fourth International Conference on Image Information Processing (ICIIP), pp. 1–5, IEEE, Shimla, India, December 2017.
View at: Publisher Site | Google Scholar

Copyright

Copyright © 2021 Tianlin Huang and Ning Wang. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies