Overview
KMeans implements the K-Means clustering algorithm, which partitions data into k clusters by minimizing within-cluster variance. The algorithm iteratively assigns samples to the nearest centroid and updates centroids based on cluster membership.
Constructor
new KMeans ( options ?: KMeansOptions )
Parameters
options
KMeansOptions
default: "{}"
Configuration options for KMeans clustering Number of clusters to form. Must be an integer >= 1.
Number of times the algorithm will run with different centroid seeds. The best result (lowest inertia) is kept. Must be an integer >= 1.
Maximum number of iterations per initialization. Must be an integer >= 1.
Relative tolerance for centroid movement to declare convergence. Must be finite and >= 0.
Seed for random number generator for reproducible results. If not provided, results may vary between runs.
Methods
fit
Fit the KMeans model to the training data.
Training data matrix where rows are samples and columns are features. Must be non-empty with consistent row sizes and finite values.
Returns: The fitted KMeans instance (for method chaining).
Throws: Error if nClusters exceeds sample count or data validation fails.
predict
Predict cluster labels for new samples.
predict ( X : Matrix ): Vector
Data matrix to predict cluster labels for. Must match the feature count of the training data.
Returns: Vector of cluster labels (integers from 0 to nClusters-1).
Throws: Error if model not fitted or feature mismatch.
fitPredict
Fit the model and return cluster labels for training data.
fitPredict ( X : Matrix ): Vector
Training data matrix to fit and predict on.
Returns: Vector of cluster labels for the training data.
Transform data to cluster-distance space.
transform ( X : Matrix ): Matrix
Data matrix to transform.
Returns: Matrix where each row contains Euclidean distances to all cluster centers.
score
Compute the negative inertia (opposite of within-cluster sum of squares).
Returns: Negative inertia value (higher is better).
Properties
Coordinates of cluster centers after fitting.
Cluster labels for training samples.
Sum of squared distances of samples to their closest cluster center.
Number of iterations run in the best initialization.
Number of features seen during fitting.
Examples
Basic Clustering
import { KMeans } from 'bun-scikit' ;
// Create sample data
const X = [
[ 1.0 , 2.0 ],
[ 1.5 , 1.8 ],
[ 5.0 , 8.0 ],
[ 8.0 , 8.0 ],
[ 1.0 , 0.6 ],
[ 9.0 , 11.0 ]
];
// Create and fit KMeans model
const kmeans = new KMeans ({ nClusters: 2 , randomState: 42 });
kmeans . fit ( X );
console . log ( 'Cluster centers:' , kmeans . clusterCenters_ );
console . log ( 'Labels:' , kmeans . labels_ );
console . log ( 'Inertia:' , kmeans . inertia_ );
Prediction on New Data
import { KMeans } from 'bun-scikit' ;
const trainData = [
[ 1.0 , 2.0 ],
[ 1.5 , 1.8 ],
[ 5.0 , 8.0 ],
[ 8.0 , 8.0 ]
];
const kmeans = new KMeans ({ nClusters: 2 , randomState: 42 });
kmeans . fit ( trainData );
// Predict clusters for new samples
const newData = [
[ 0.0 , 0.0 ],
[ 9.0 , 10.0 ]
];
const predictions = kmeans . predict ( newData );
console . log ( 'Predicted clusters:' , predictions );
import { KMeans } from 'bun-scikit' ;
const X = [
[ 1.0 , 2.0 ],
[ 1.5 , 1.8 ],
[ 5.0 , 8.0 ],
[ 8.0 , 8.0 ]
];
const kmeans = new KMeans ({ nClusters: 2 , randomState: 42 });
kmeans . fit ( X );
// Transform to cluster distance space
const distances = kmeans . transform ( X );
console . log ( 'Distances to centroids:' , distances );
// Output: [[d1_to_c1, d1_to_c2], [d2_to_c1, d2_to_c2], ...]
Multiple Initializations
import { KMeans } from 'bun-scikit' ;
const X = [
[ 1.0 , 2.0 ], [ 1.5 , 1.8 ], [ 5.0 , 8.0 ],
[ 8.0 , 8.0 ], [ 1.0 , 0.6 ], [ 9.0 , 11.0 ]
];
// Run with 20 different initializations
const kmeans = new KMeans ({
nClusters: 3 ,
nInit: 20 ,
maxIter: 500 ,
randomState: 42
});
kmeans . fit ( X );
console . log ( 'Best inertia after 20 runs:' , kmeans . inertia_ );
console . log ( 'Iterations used:' , kmeans . nIter_ );
Algorithm Details
KMeans uses the standard Lloyd’s algorithm:
Initialization : Randomly select k samples as initial centroids
Assignment : Assign each sample to nearest centroid
Update : Recalculate centroids as mean of assigned samples
Convergence : Repeat steps 2-3 until centroid movement < tolerance or max iterations reached
The algorithm runs multiple times with different initializations (controlled by nInit) and returns the solution with the lowest inertia.
Notes
KMeans assumes clusters are spherical and equally sized
Sensitive to initial centroid placement (use multiple initializations)
Scales well to large datasets
Use randomState for reproducible results
Empty clusters are handled by reassigning a random sample