Skip to main content

Overview

SpectralClustering applies clustering to a projection of the data onto a lower-dimensional space derived from the eigenvectors of the normalized affinity matrix. It works well for clusters that aren’t necessarily convex or compact.

Constructor

new SpectralClustering(options?: SpectralClusteringOptions)

Parameters

options
SpectralClusteringOptions
default:"{}"
Configuration options for spectral clustering

Methods

fit

Fit the spectral clustering model to the training data.
fit(X: Matrix): this
X
Matrix
required
Training data matrix where rows are samples and columns are features. If affinity is “precomputed”, X should be a square affinity matrix.
Returns: The fitted SpectralClustering instance (for method chaining). Throws: Error if nClusters exceeds sample count, affinity validation fails, or data validation fails.

fitPredict

Fit the model and return cluster labels for training data.
fitPredict(X: Matrix): Vector
X
Matrix
required
Training data matrix to fit and predict on.
Returns: Vector of cluster labels (integers from 0 to nClusters-1).

Properties

labels_
Vector | null
Cluster labels for each training sample.
affinityMatrix_
Matrix | null
Affinity matrix constructed from the input data.
embedding_
Matrix | null
Spectral embedding of the training data (row-normalized eigenvectors).
nFeaturesIn_
number | null
Number of features seen during fitting.

Examples

Basic Spectral Clustering

import { SpectralClustering } from 'bun-scikit';

const X = [
  [1, 1], [1.5, 1.8], [1.2, 1.1],
  [8, 8], [8.2, 8.5], [8.5, 8.1],
  [2, 10], [2.5, 10.2], [2.2, 10.5]
];

// Create and fit model
const spectral = new SpectralClustering({
  nClusters: 3,
  affinity: 'rbf',
  gamma: 1.0,
  randomState: 42
});

spectral.fit(X);

console.log('Cluster labels:', spectral.labels_);
console.log('Embedding shape:', spectral.embedding_?.length, 'x', spectral.embedding_?.[0].length);

Using Nearest Neighbors Affinity

import { SpectralClustering } from 'bun-scikit';

// Data with non-convex clusters
const data = [
  // Cluster 1: circular
  [0, 0], [1, 0], [0, 1], [-1, 0], [0, -1],
  // Cluster 2: elongated
  [10, 0], [11, 0], [12, 0], [13, 0], [14, 0]
];

const spectral = new SpectralClustering({
  nClusters: 2,
  affinity: 'nearest_neighbors',
  nNeighbors: 3,
  randomState: 42
});

spectral.fit(data);
console.log('Labels:', spectral.labels_);

Tuning Gamma Parameter

import { SpectralClustering } from 'bun-scikit';

const X = [
  [1, 1], [1.5, 1.8], [5, 5], [5.5, 5.2]
];

// Try different gamma values
for (const gamma of [0.1, 1.0, 10.0]) {
  const model = new SpectralClustering({
    nClusters: 2,
    affinity: 'rbf',
    gamma,
    randomState: 42
  });
  
  model.fit(X);
  console.log(`gamma=${gamma} labels:`, model.labels_);
}

// Lower gamma: broader influence (more connected)
// Higher gamma: tighter influence (more separated)

Precomputed Affinity Matrix

import { SpectralClustering } from 'bun-scikit';

// Custom affinity matrix (must be symmetric, non-negative)
const affinityMatrix = [
  [1.0, 0.9, 0.1, 0.05],
  [0.9, 1.0, 0.15, 0.1],
  [0.1, 0.15, 1.0, 0.85],
  [0.05, 0.1, 0.85, 1.0]
];

const spectral = new SpectralClustering({
  nClusters: 2,
  affinity: 'precomputed',
  randomState: 42
});

spectral.fit(affinityMatrix);
console.log('Cluster labels:', spectral.labels_);

Non-Convex Cluster Detection

import { SpectralClustering } from 'bun-scikit';

// Two concentric circles (challenging for KMeans)
const innerCircle = Array.from({ length: 20 }, (_, i) => {
  const angle = (i / 20) * 2 * Math.PI;
  return [Math.cos(angle), Math.sin(angle)];
});

const outerCircle = Array.from({ length: 40 }, (_, i) => {
  const angle = (i / 40) * 2 * Math.PI;
  return [3 * Math.cos(angle), 3 * Math.sin(angle)];
});

const data = [...innerCircle, ...outerCircle];

const spectral = new SpectralClustering({
  nClusters: 2,
  affinity: 'nearest_neighbors',
  nNeighbors: 5,
  randomState: 42
});

spectral.fit(data);

// Should separate inner and outer circles
console.log('Inner circle labels:', spectral.labels_!.slice(0, 20));
console.log('Outer circle labels:', spectral.labels_!.slice(20));

Analyzing the Embedding

import { SpectralClustering } from 'bun-scikit';

const X = [
  [1, 2], [1.5, 1.8], [5, 8], [8, 8]
];

const spectral = new SpectralClustering({
  nClusters: 2,
  affinity: 'rbf',
  gamma: 1.0,
  randomState: 42
});

spectral.fit(X);

console.log('Original data shape:', X.length, 'x', X[0].length);
console.log('Embedding shape:', spectral.embedding_!.length, 'x', spectral.embedding_![0].length);
console.log('Spectral embedding:', spectral.embedding_);
// The embedding is a lower-dimensional representation
// where clusters are more separable

Algorithm Details

Spectral clustering works in several steps:
  1. Affinity Matrix Construction:
    • RBF: A[i,j] = exp(-gamma * ||xi - xj||²)
    • Nearest Neighbors: A[i,j] = 1 if j in k-NN of i, else 0
  2. Normalized Laplacian:
    • Compute degree matrix D
    • Normalize: L = D^(-1/2) * A * D^(-1/2)
  3. Eigenvector Computation:
    • Find top k eigenvectors of L
    • Stack eigenvectors to form embedding matrix
  4. Row Normalization:
    • Normalize each row of embedding to unit length
  5. KMeans Clustering:
    • Apply KMeans to normalized embedding

Advantages

  • Finds arbitrarily shaped clusters
  • Works with non-convex clusters
  • Can capture complex cluster structures
  • Effective with graph-based data

Considerations

  • Computationally expensive for large datasets
  • Sensitive to parameter choices (gamma, nNeighbors)
  • Results depend on eigenvalue decomposition quality
  • May struggle with highly imbalanced cluster sizes

Parameter Selection Guide

affinity=“rbf”:
  • Use when clusters have smooth boundaries
  • Tune gamma: smaller = looser, larger = tighter
  • Start with gamma = 1 / (nFeatures * variance)
affinity=“nearest_neighbors”:
  • Use for elongated or irregular cluster shapes
  • Tune nNeighbors: smaller = finer structure, larger = coarser
  • Typical range: 5-20 neighbors
nClusters:
  • Try multiple values and use evaluation metrics
  • Consider using eigenvector gap analysis

Build docs developers (and LLMs) love