KMeans

Overview

KMeans implements the K-Means clustering algorithm, which partitions data into k clusters by minimizing within-cluster variance. The algorithm iteratively assigns samples to the nearest centroid and updates centroids based on cluster membership.

Constructor

new KMeans(options?: KMeansOptions)

Parameters

options

KMeansOptions

default:"{}"

Configuration options for KMeans clustering

Show properties

nClusters

number

default:"8"

Number of clusters to form. Must be an integer >= 1.

nInit

number

default:"10"

Number of times the algorithm will run with different centroid seeds. The best result (lowest inertia) is kept. Must be an integer >= 1.

maxIter

number

default:"300"

Maximum number of iterations per initialization. Must be an integer >= 1.

tolerance

number

default:"1e-4"

Relative tolerance for centroid movement to declare convergence. Must be finite and >= 0.

randomState

number

Seed for random number generator for reproducible results. If not provided, results may vary between runs.

Methods

fit

Fit the KMeans model to the training data.

fit(X: Matrix): this

Matrix

required

Training data matrix where rows are samples and columns are features. Must be non-empty with consistent row sizes and finite values.

Returns: The fitted KMeans instance (for method chaining). Throws: Error if nClusters exceeds sample count or data validation fails.

predict

Predict cluster labels for new samples.

predict(X: Matrix): Vector

Matrix

required

Data matrix to predict cluster labels for. Must match the feature count of the training data.

Returns: Vector of cluster labels (integers from 0 to nClusters-1). Throws: Error if model not fitted or feature mismatch.

fitPredict

Fit the model and return cluster labels for training data.

fitPredict(X: Matrix): Vector

Matrix

required

Training data matrix to fit and predict on.

Returns: Vector of cluster labels for the training data.

transform

Transform data to cluster-distance space.

transform(X: Matrix): Matrix

Matrix

required

Data matrix to transform.

Returns: Matrix where each row contains Euclidean distances to all cluster centers.

score

Compute the negative inertia (opposite of within-cluster sum of squares).

score(X: Matrix): number

Matrix

required

Data matrix to score.

Returns: Negative inertia value (higher is better).

Properties

clusterCenters_

Matrix | null

Coordinates of cluster centers after fitting.

labels_

Vector | null

Cluster labels for training samples.

inertia_

number | null

Sum of squared distances of samples to their closest cluster center.

nIter_

number | null

Number of iterations run in the best initialization.

nFeaturesIn_

number | null

Number of features seen during fitting.

Examples

Basic Clustering

import { KMeans } from 'bun-scikit';

// Create sample data
const X = [
  [1.0, 2.0],
  [1.5, 1.8],
  [5.0, 8.0],
  [8.0, 8.0],
  [1.0, 0.6],
  [9.0, 11.0]
];

// Create and fit KMeans model
const kmeans = new KMeans({ nClusters: 2, randomState: 42 });
kmeans.fit(X);

console.log('Cluster centers:', kmeans.clusterCenters_);
console.log('Labels:', kmeans.labels_);
console.log('Inertia:', kmeans.inertia_);

Prediction on New Data

import { KMeans } from 'bun-scikit';

const trainData = [
  [1.0, 2.0],
  [1.5, 1.8],
  [5.0, 8.0],
  [8.0, 8.0]
];

const kmeans = new KMeans({ nClusters: 2, randomState: 42 });
kmeans.fit(trainData);

// Predict clusters for new samples
const newData = [
  [0.0, 0.0],
  [9.0, 10.0]
];

const predictions = kmeans.predict(newData);
console.log('Predicted clusters:', predictions);

Distance Transformation

import { KMeans } from 'bun-scikit';

const X = [
  [1.0, 2.0],
  [1.5, 1.8],
  [5.0, 8.0],
  [8.0, 8.0]
];

const kmeans = new KMeans({ nClusters: 2, randomState: 42 });
kmeans.fit(X);

// Transform to cluster distance space
const distances = kmeans.transform(X);
console.log('Distances to centroids:', distances);
// Output: [[d1_to_c1, d1_to_c2], [d2_to_c1, d2_to_c2], ...]

Multiple Initializations

import { KMeans } from 'bun-scikit';

const X = [
  [1.0, 2.0], [1.5, 1.8], [5.0, 8.0],
  [8.0, 8.0], [1.0, 0.6], [9.0, 11.0]
];

// Run with 20 different initializations
const kmeans = new KMeans({
  nClusters: 3,
  nInit: 20,
  maxIter: 500,
  randomState: 42
});

kmeans.fit(X);
console.log('Best inertia after 20 runs:', kmeans.inertia_);
console.log('Iterations used:', kmeans.nIter_);

Algorithm Details

KMeans uses the standard Lloyd’s algorithm:

Initialization: Randomly select k samples as initial centroids
Assignment: Assign each sample to nearest centroid
Update: Recalculate centroids as mean of assigned samples
Convergence: Repeat steps 2-3 until centroid movement < tolerance or max iterations reached

The algorithm runs multiple times with different initializations (controlled by nInit) and returns the solution with the lowest inertia.

Notes

KMeans assumes clusters are spherical and equally sized
Sensitive to initial centroid placement (use multiple initializations)
Scales well to large datasets
Use randomState for reproducible results
Empty clusters are handled by reassigning a random sample

Linear Models

Tree & Ensemble

Neighbors & Naive Bayes

SVM

Clustering

Decomposition

Manifold Learning

Preprocessing

Model Selection

Metrics

Pipeline & Composition

Meta-Estimators

Feature Selection

Overview

Constructor

Parameters

Methods

fit

predict

fitPredict

transform

score

Properties

Examples

Basic Clustering

Prediction on New Data

Distance Transformation

Multiple Initializations

Algorithm Details

Notes

Build docs developers (and LLMs) love

Linear Models

Tree & Ensemble

Neighbors & Naive Bayes

SVM

Clustering

Decomposition

Manifold Learning

Preprocessing

Model Selection

Metrics

Pipeline & Composition

Meta-Estimators

Feature Selection

Documentation Index

​Overview

​Constructor

​Parameters

​Methods

​fit

​predict

​fitPredict

​transform

​score

​Properties

​Examples

​Basic Clustering

​Prediction on New Data

​Distance Transformation

​Multiple Initializations

​Algorithm Details

​Notes

Build docs developers (and LLMs) love

Overview

Constructor

Parameters

Methods

fit

predict

fitPredict

transform

score

Properties

Examples

Basic Clustering

Prediction on New Data

Distance Transformation

Multiple Initializations

Algorithm Details

Notes