Activity (Process)

Nearest Neighbor Chain Algorithm

The nearest-neighbor chain algorithm is an efficient method for performing agglomerative hierarchical clustering. It optimizes the standard agglomerative clustering process, reducing the naive implementation's runtime complexity from O(n3)O(n^3) to O(n2)O(n^2).

Algorithm Steps:

  1. Initialize each data point as its own single-element cluster.
  2. Initialize an empty stack SS.
  3. While there is more than one cluster remaining: a. If SS is empty, push an arbitrary remaining cluster onto SS. b. Let CtopC_{top} be the cluster at the top of SS. Find its nearest neighbor cluster, CnextC_{next}. c. If CnextC_{next} is already in SS, it must be the cluster immediately below CtopC_{top} on the stack (meaning they are mutual nearest neighbors). Pop both CtopC_{top} and CnextC_{next} from SS, merge them into a new cluster, and update the pairwise distances. d. If CnextC_{next} is not in SS, push CnextC_{next} onto SS.

Note: Refer to "Modern hierarchical, agglomerative clustering methods" (Müllner, 2011) for detailed analysis and correctness proofs.

0

1

Updated 2026-06-07

Tags

Data Science