Clustering, as the basic composition of data analysis, plays a significant role. each clustering algorithm has its own strengths and weaknesses, due to the complexity of information.
process of clustering
- Feature extraction and selection: extract and select the most representative features
from the original data set;
- Clustering algorithm design: design the clustering algorithm according to the
characteristics of the problem;
- Result evaluation: evaluate the clustering result and judge the validity of algorithm;
- Result explanation: give a practical explanation for the clustering result;
common similarity and distance measurements
Distance Metrics Similarity functions
Types of Clustering
Several approaches to clustering exist. For an exhaustive list, see A Comprehensive Survey of Clustering Algorithms Xu, D. & Tian, Y. Ann. Data. Sci. (2015) 2: 165. Each approach is best suited to a particular data distribution. Below is a short discussion of four common approaches, focusing on centroid-based clustering using k-means.
- Partition-based Clustering
- Hierarchical Clustering
- Density-based Clustering
- Distribution-based Clustering