Hierarchical Clustering

Hasn’t specifying the number of clusters in KNN and K-mediods been a pain, no worries because now we have Hierarchical clustering to save us from the mess. Another added advantage for this is ability to visualize the construction of clusters diagrammatically.

Let us first introduce tree based representation Dendograms which make things beautiful for Hierarchical clustering.

  1. Conclusions about the proximity of two observations should not be implied by their relationship on the horizontal axis nor by the vertical connections. Rather, the height of the branch between an observation and the clusters of observations below them indicate the distance between the observation and that cluster it is joined to. For example, consider observation 9 & 2 in figure below. They appear close on the dendrogram (right) but, in fact, their closeness on the dendrogram imply they are approximately the same distance measure from the cluster that they are fused to (observations 5, 7, & 8). It by no means implies that observation 9 & 2 are close to one another.

2. To obtain number of clusters at end, we draw a horizontal axis through y axis. The height of the cut to the dendrogram controls the number of clusters obtained. The best choice of the no. of clusters is the no. of vertical lines in the dendrogram cut by a horizontal line that can transverse the maximum distance vertically without intersecting a cluster(node).

In the below example, the best choice of no. of clusters will be 4 as the red horizontal line in the dendrogram below covers maximum vertical distance AB.

3. And cherry on the cake, you can get correlation(“Baker” or “Cophenetic” correlation matrix) between different dendograms(different may be due to linkage method) of same data.

Representation I found done on R

Hierarchical clustering can be divided into two main types:

  1. Agglomerative clustering: AGNES (AGglomerative NESting) works in a bottom-up manner. That is, each observation is initially considered as a single-element cluster (leaf). At each step of the algorithm, the two clusters that are the most similar(closer) are combined into a new bigger cluster (nodes). This procedure is iterated until all points are a member of just one single big cluster (root). The result is a tree which can be displayed using a dendrogram.

Note that agglomerative clustering is good at identifying small clusters. Divisive hierarchical clustering, on the other hand, is better at identifying large clusters.

Similar to k-means, we measure the (dis)similarity of observations using distance measures (e.g., Euclidean distance, Manhattan distance, etc.); the Euclidean distance is most commonly the default. However, a fundamental question in hierarchical clustering is: How do we measure the dissimilarity between two clusters of observations? A number of different cluster agglomeration methods (i.e., linkage methods) have been developed to answer this question. The most common methods are:

  • Maximum or complete linkage clustering: Computes all pairwise dissimilarities between the elements in cluster 1 and the elements in cluster 2, and considers the largest value of these dissimilarities as the distance between the two clusters. It tends to produce more compact clusters.

Complete linkage and Ward’s method are often preferred for AGNES clustering and mean or average linkage clustering method for DIANA.

Checkout different dendograms for each Linkage method

Points to Note

  1. Don’t forget to standardize variables if you want to avoid blunders.


  1. https://www.analyticsvidhya.com/blog/2016/11/an-introduction-to-clustering-and-different-methods-of-clustering/

Problem Solver, Data Science, Actuarial Science, Knowledge Sharer, Hardcore Googler