Clustering, as the basic composition of data analysis, plays a significant role. each clustering algorithm has its own strengths and weaknesses, due to the complexity of information.

process of clustering

  1. Feature extraction and selection: extract and select the most representative features

from the original data set;

  1. Clustering algorithm design: design the clustering algorithm according to the

characteristics of the problem;

  1. Result evaluation: evaluate the clustering result and judge the validity of algorithm;
  2. Result explanation: give a practical explanation for the clustering result;

common similarity and distance measurements

Distance Metrics Similarity functions

Types of Clustering

Several approaches to clustering exist. For an exhaustive list, see A Comprehensive Survey of Clustering Algorithms Xu, D. & Tian, Y. Ann. Data. Sci. (2015) 2: 165. Each approach is best suited to a particular data distribution. Below is a short discussion of four common approaches, focusing on centroid-based clustering using k-means.

Result evaluation

evaluation indicators

Reference List

  1. https://developers.google.com/machine-learning/clustering/clustering-algorithms