Skip to main content

Overview

AgglomerativeClustering performs hierarchical clustering using a bottom-up approach. It starts with each sample as its own cluster and progressively merges the closest pairs of clusters until reaching the desired number of clusters.

Constructor

new AgglomerativeClustering(options?: AgglomerativeClusteringOptions)

Parameters

options
AgglomerativeClusteringOptions
default:"{}"
Configuration options for agglomerative clustering

Methods

fit

Fit the hierarchical clustering model to the training data.
fit(X: Matrix): this
X
Matrix
required
Training data matrix where rows are samples and columns are features. Must be non-empty with consistent row sizes and finite values.
Returns: The fitted AgglomerativeClustering instance (for method chaining). Throws: Error if nClusters exceeds sample count or data validation fails.

fitPredict

Fit the model and return cluster labels for training data.
fitPredict(X: Matrix): Vector
X
Matrix
required
Training data matrix to fit and predict on.
Returns: Vector of cluster labels (integers from 0 to nClusters-1).

Properties

labels_
Vector | null
Cluster labels for each training sample.
children_
number[][] | null
Merge history. Each row [left, right] represents clusters that were merged.
distances_
Vector | null
Distance at each merge step.
nConnectedComponents_
number
default:"1"
Number of connected components in the graph.
nLeaves_
number | null
Number of leaves in the hierarchical tree.
nClusters_
number | null
Number of clusters found during fitting.
nFeaturesIn_
number | null
Number of features seen during fitting.

nMerges_

Getter property that returns the number of merge operations performed.
get nMerges_(): number
Returns: Number of merges in the hierarchical tree.

Examples

Basic Hierarchical Clustering

import { AgglomerativeClustering } from 'bun-scikit';

const X = [
  [1.0, 2.0],
  [1.5, 1.8],
  [5.0, 8.0],
  [8.0, 8.0],
  [1.0, 0.6],
  [9.0, 11.0]
];

// Create and fit model with Ward linkage
const clustering = new AgglomerativeClustering({
  nClusters: 2,
  linkage: 'ward'
});

clustering.fit(X);

console.log('Labels:', clustering.labels_);
console.log('Number of merges:', clustering.nMerges_);
console.log('Merge distances:', clustering.distances_);

Comparing Linkage Methods

import { AgglomerativeClustering } from 'bun-scikit';

const X = [
  [1, 2], [1.5, 1.8], [5, 8],
  [8, 8], [1, 0.6], [9, 11]
];

const linkages = ['ward', 'complete', 'average', 'single'] as const;

for (const linkage of linkages) {
  const model = new AgglomerativeClustering({
    nClusters: 2,
    linkage
  });
  
  model.fit(X);
  console.log(`${linkage} linkage labels:`, model.labels_);
}

Hierarchical Tree Analysis

import { AgglomerativeClustering } from 'bun-scikit';

const data = [
  [0, 0], [1, 1], [2, 2],
  [10, 10], [11, 11], [12, 12]
];

const clustering = new AgglomerativeClustering({
  nClusters: 2,
  linkage: 'complete'
});

clustering.fit(data);

console.log('Merge history:');
clustering.children_!.forEach((merge, i) => {
  console.log(`Step ${i + 1}: Merge ${merge[0]} and ${merge[1]} at distance ${clustering.distances_![i].toFixed(3)}`);
});

Variable Number of Clusters

import { AgglomerativeClustering } from 'bun-scikit';

const X = [
  [1, 2], [1.5, 1.8], [1.2, 2.1],
  [5, 8], [5.5, 8.2], [5.2, 7.9],
  [9, 11], [9.2, 11.3], [8.8, 10.9]
];

// Try different numbers of clusters
for (const k of [2, 3, 4]) {
  const model = new AgglomerativeClustering({
    nClusters: k,
    linkage: 'ward'
  });
  
  model.fit(X);
  console.log(`k=${k} labels:`, model.labels_);
}

Dendrogram Visualization Data

import { AgglomerativeClustering } from 'bun-scikit';

const samples = [
  [0, 0], [2, 2], [1, 1],
  [10, 10], [12, 12], [11, 11]
];

const clustering = new AgglomerativeClustering({
  nClusters: 2,
  linkage: 'average'
});

clustering.fit(samples);

// Extract data for dendrogram plotting
const dendrogramData = {
  children: clustering.children_,
  distances: clustering.distances_,
  nSamples: samples.length,
  labels: clustering.labels_
};

console.log('Dendrogram data:', dendrogramData);

Algorithm Details

The algorithm works as follows:
  1. Initialize: Each sample starts as its own cluster
  2. Compute distances: Calculate distance between all cluster pairs
  3. Merge: Combine the two closest clusters
  4. Update distances: Recalculate distances for the new cluster
  5. Repeat: Continue until desired number of clusters reached

Linkage Methods

Ward: Minimizes the variance within clusters (recommended for most cases)
distance = sqrt((nA * nB) / (nA + nB)) * ||centroidA - centroidB||
Complete: Maximum distance between any two members
distance = max(dist(a, b)) for a in A, b in B
Average: Average distance between all pairs of members
distance = mean(dist(a, b)) for a in A, b in B
Single: Minimum distance between any two members
distance = min(dist(a, b)) for a in A, b in B

Advantages

  • No need to specify number of clusters upfront (can cut dendrogram at any level)
  • Produces hierarchical relationships
  • Works with any distance metric
  • Deterministic results

Considerations

  • Time complexity: O(n³) in this implementation
  • Space complexity: O(n²)
  • Not suitable for very large datasets
  • Cannot undo previous merges
  • Sensitive to outliers (especially with single linkage)

Linkage Selection Guide

  • Ward: Best for equal-sized, compact clusters
  • Complete: Good for avoiding chain-like clusters
  • Average: Balanced approach, less sensitive to outliers
  • Single: Can find elongated clusters but sensitive to noise

Build docs developers (and LLMs) love