Skip to main content

Overview

KMeans implements the K-Means clustering algorithm, which partitions data into k clusters by minimizing within-cluster variance. The algorithm iteratively assigns samples to the nearest centroid and updates centroids based on cluster membership.

Constructor

new KMeans(options?: KMeansOptions)

Parameters

options
KMeansOptions
default:"{}"
Configuration options for KMeans clustering

Methods

fit

Fit the KMeans model to the training data.
fit(X: Matrix): this
X
Matrix
required
Training data matrix where rows are samples and columns are features. Must be non-empty with consistent row sizes and finite values.
Returns: The fitted KMeans instance (for method chaining). Throws: Error if nClusters exceeds sample count or data validation fails.

predict

Predict cluster labels for new samples.
predict(X: Matrix): Vector
X
Matrix
required
Data matrix to predict cluster labels for. Must match the feature count of the training data.
Returns: Vector of cluster labels (integers from 0 to nClusters-1). Throws: Error if model not fitted or feature mismatch.

fitPredict

Fit the model and return cluster labels for training data.
fitPredict(X: Matrix): Vector
X
Matrix
required
Training data matrix to fit and predict on.
Returns: Vector of cluster labels for the training data.

transform

Transform data to cluster-distance space.
transform(X: Matrix): Matrix
X
Matrix
required
Data matrix to transform.
Returns: Matrix where each row contains Euclidean distances to all cluster centers.

score

Compute the negative inertia (opposite of within-cluster sum of squares).
score(X: Matrix): number
X
Matrix
required
Data matrix to score.
Returns: Negative inertia value (higher is better).

Properties

clusterCenters_
Matrix | null
Coordinates of cluster centers after fitting.
labels_
Vector | null
Cluster labels for training samples.
inertia_
number | null
Sum of squared distances of samples to their closest cluster center.
nIter_
number | null
Number of iterations run in the best initialization.
nFeaturesIn_
number | null
Number of features seen during fitting.

Examples

Basic Clustering

import { KMeans } from 'bun-scikit';

// Create sample data
const X = [
  [1.0, 2.0],
  [1.5, 1.8],
  [5.0, 8.0],
  [8.0, 8.0],
  [1.0, 0.6],
  [9.0, 11.0]
];

// Create and fit KMeans model
const kmeans = new KMeans({ nClusters: 2, randomState: 42 });
kmeans.fit(X);

console.log('Cluster centers:', kmeans.clusterCenters_);
console.log('Labels:', kmeans.labels_);
console.log('Inertia:', kmeans.inertia_);

Prediction on New Data

import { KMeans } from 'bun-scikit';

const trainData = [
  [1.0, 2.0],
  [1.5, 1.8],
  [5.0, 8.0],
  [8.0, 8.0]
];

const kmeans = new KMeans({ nClusters: 2, randomState: 42 });
kmeans.fit(trainData);

// Predict clusters for new samples
const newData = [
  [0.0, 0.0],
  [9.0, 10.0]
];

const predictions = kmeans.predict(newData);
console.log('Predicted clusters:', predictions);

Distance Transformation

import { KMeans } from 'bun-scikit';

const X = [
  [1.0, 2.0],
  [1.5, 1.8],
  [5.0, 8.0],
  [8.0, 8.0]
];

const kmeans = new KMeans({ nClusters: 2, randomState: 42 });
kmeans.fit(X);

// Transform to cluster distance space
const distances = kmeans.transform(X);
console.log('Distances to centroids:', distances);
// Output: [[d1_to_c1, d1_to_c2], [d2_to_c1, d2_to_c2], ...]

Multiple Initializations

import { KMeans } from 'bun-scikit';

const X = [
  [1.0, 2.0], [1.5, 1.8], [5.0, 8.0],
  [8.0, 8.0], [1.0, 0.6], [9.0, 11.0]
];

// Run with 20 different initializations
const kmeans = new KMeans({
  nClusters: 3,
  nInit: 20,
  maxIter: 500,
  randomState: 42
});

kmeans.fit(X);
console.log('Best inertia after 20 runs:', kmeans.inertia_);
console.log('Iterations used:', kmeans.nIter_);

Algorithm Details

KMeans uses the standard Lloyd’s algorithm:
  1. Initialization: Randomly select k samples as initial centroids
  2. Assignment: Assign each sample to nearest centroid
  3. Update: Recalculate centroids as mean of assigned samples
  4. Convergence: Repeat steps 2-3 until centroid movement < tolerance or max iterations reached
The algorithm runs multiple times with different initializations (controlled by nInit) and returns the solution with the lowest inertia.

Notes

  • KMeans assumes clusters are spherical and equally sized
  • Sensitive to initial centroid placement (use multiple initializations)
  • Scales well to large datasets
  • Use randomState for reproducible results
  • Empty clusters are handled by reassigning a random sample

Build docs developers (and LLMs) love