Automatic dating of documents and temporal text classification
Given an initial set of k means m The algorithm is often presented as assigning objects to the nearest cluster by distance.
Using a different distance function other than (squared) Euclidean distance may stop the algorithm from converging.
The Random Partition method first randomly assigns a cluster to each observation and then proceeds to the update step, thus computing the initial mean to be the centroid of the cluster's randomly assigned points.), where each observation is a d-dimensional real vector, k-means clustering aims to partition the n observations into k (≤ n) sets S = so as to minimize the within-cluster sum of squares (WCSS) (i.e. Formally, the objective is to find: The most common algorithm uses an iterative refinement technique.Due to its ubiquity, it is often called the k-means algorithm; it is also referred to as Lloyd's algorithm, particularly in the computer science community.Lloyd's algorithm is therefore often considered to be of "linear" complexity in practice, although it is in the worst case superpolynomial when performed until convergence.Lloyd's algorithm is the standard approach for this problem.