DBSCAN

Overview

DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a density-based clustering algorithm that groups together points that are closely packed together, marking points in low-density regions as outliers.

Constructor

new DBSCAN(options?: DBSCANOptions)

Parameters

options

DBSCANOptions

default:"{}"

Configuration options for DBSCAN clustering

Show properties

eps

number

default:"0.5"

Maximum distance between two samples for them to be considered neighbors. Must be finite and > 0.

minSamples

number

default:"5"

Minimum number of samples in a neighborhood for a point to be considered a core point. Must be an integer >= 1.

Methods

fit

Fit the DBSCAN model to the training data.

fit(X: Matrix): this

Matrix

required

Training data matrix where rows are samples and columns are features. Must be non-empty with consistent row sizes and finite values.

Returns: The fitted DBSCAN instance (for method chaining). Throws: Error if data validation fails.

fitPredict

Fit the model and return cluster labels for training data.

fitPredict(X: Matrix): Vector

Matrix

required

Training data matrix to fit and predict on.

Returns: Vector of cluster labels. Core and border points are assigned cluster IDs (0, 1, 2, …), while noise points are labeled -1.

Properties

labels_

Vector | null

Cluster labels for each sample. Label -1 indicates noise points.

coreSampleIndices_

number[] | null

Indices of core samples (points with at least minSamples neighbors within eps distance).

components_

Matrix | null

Copy of core samples found during fitting.

nFeaturesIn_

number | null

Number of features seen during fitting.

nClusters_

Getter property that returns the number of clusters found (excluding noise).

get nClusters_(): number

Returns: Number of distinct clusters (not counting noise points).

Examples

Basic Clustering with Noise Detection

import { DBSCAN } from 'bun-scikit';

// Create sample data with noise
const X = [
  [1.0, 2.0],
  [1.5, 1.8],
  [1.2, 2.1],
  [8.0, 8.0],
  [8.1, 8.2],
  [25.0, 25.0],  // Outlier
  [7.9, 8.1]
];

// Create and fit DBSCAN model
const dbscan = new DBSCAN({ eps: 0.5, minSamples: 2 });
dbscan.fit(X);

console.log('Labels:', dbscan.labels_);
// Output: [0, 0, 0, 1, 1, -1, 1]
// Cluster 0: first 3 points
// Cluster 1: points 3,4,6
// -1: noise point (outlier)

console.log('Number of clusters:', dbscan.nClusters_);
console.log('Core samples:', dbscan.coreSampleIndices_);

Anomaly Detection

import { DBSCAN } from 'bun-scikit';

const data = [
  [1.0, 2.0], [1.2, 2.1], [1.1, 1.9],  // Normal cluster
  [5.0, 5.0], [5.1, 5.2], [4.9, 5.1],  // Normal cluster
  [10.0, 10.0],                         // Anomaly
  [-5.0, -5.0]                          // Anomaly
];

const dbscan = new DBSCAN({ eps: 0.5, minSamples: 3 });
dbscan.fit(data);

// Find anomalies (noise points)
const anomalies = data.filter((_, idx) => dbscan.labels_![idx] === -1);
console.log('Detected anomalies:', anomalies);
// Output: [[10.0, 10.0], [-5.0, -5.0]]

Adjusting Density Parameters

import { DBSCAN } from 'bun-scikit';

const X = [
  [1.0, 2.0], [2.0, 2.0], [2.0, 3.0],
  [8.0, 7.0], [8.0, 8.0], [25.0, 80.0]
];

// Strict clustering (smaller eps, more minSamples)
const strict = new DBSCAN({ eps: 1.5, minSamples: 3 });
strict.fit(X);
console.log('Strict labels:', strict.labels_);
console.log('Strict clusters:', strict.nClusters_);

// Loose clustering (larger eps, fewer minSamples)
const loose = new DBSCAN({ eps: 3.0, minSamples: 2 });
loose.fit(X);
console.log('Loose labels:', loose.labels_);
console.log('Loose clusters:', loose.nClusters_);

Geographic Clustering

import { DBSCAN } from 'bun-scikit';

// Latitude, Longitude coordinates
const locations = [
  [40.7128, -74.0060],  // New York
  [40.7589, -73.9851],  // New York (nearby)
  [34.0522, -118.2437], // Los Angeles
  [34.0489, -118.2518], // Los Angeles (nearby)
  [51.5074, -0.1278],   // London
];

// Cluster nearby locations (eps in degrees)
const dbscan = new DBSCAN({ eps: 0.5, minSamples: 2 });
dbscan.fit(locations);

console.log('Location clusters:', dbscan.labels_);
// Groups nearby cities together

Algorithm Details

DBSCAN classifies points into three categories:

Core points: Points with at least minSamples neighbors within eps distance
Border points: Non-core points within eps distance of a core point
Noise points: Points that are neither core nor border (labeled as -1)

The algorithm:

For each unvisited point, find all neighbors within eps distance
If point has >= minSamples neighbors, start a new cluster
Expand cluster by recursively visiting neighbors
Mark points that can’t form/join clusters as noise

Advantages

Finds arbitrarily shaped clusters
Robust to outliers (marks them as noise)
No need to specify number of clusters
Works well with varying cluster densities

Considerations

Sensitive to eps and minSamples parameters
Not suitable for clusters with varying densities
Performance degrades with high-dimensional data
Time complexity: O(n²) with naive implementation

Parameter Selection

eps: Start with average distance to k-th nearest neighbor (where k = minSamples) minSamples: Rule of thumb is 2 × dimensions, but adjust based on:

Larger values → fewer clusters, more noise
Smaller values → more clusters, less noise

Linear Models

Tree & Ensemble

Neighbors & Naive Bayes

SVM

Clustering

Decomposition

Manifold Learning

Preprocessing

Model Selection

Metrics

Pipeline & Composition

Meta-Estimators

Feature Selection

Overview

Constructor

Parameters

Methods

fit

fitPredict

Properties

nClusters_

Examples

Basic Clustering with Noise Detection

Anomaly Detection

Adjusting Density Parameters

Geographic Clustering

Algorithm Details

Advantages

Considerations

Parameter Selection

Build docs developers (and LLMs) love

Linear Models

Tree & Ensemble

Neighbors & Naive Bayes

SVM

Clustering

Decomposition

Manifold Learning

Preprocessing

Model Selection

Metrics

Pipeline & Composition

Meta-Estimators

Feature Selection

Documentation Index

​Overview

​Constructor

​Parameters

​Methods

​fit

​fitPredict

​Properties

​nClusters_

​Examples

​Basic Clustering with Noise Detection

​Anomaly Detection

​Adjusting Density Parameters

​Geographic Clustering

​Algorithm Details

​Advantages

​Considerations

​Parameter Selection

Build docs developers (and LLMs) love

Overview

Constructor

Parameters

Methods

fit

fitPredict

Properties

nClusters_

Examples

Basic Clustering with Noise Detection

Anomaly Detection

Adjusting Density Parameters

Geographic Clustering

Algorithm Details

Advantages

Considerations

Parameter Selection