Overview
The GaussianNB implements the Gaussian Naive Bayes algorithm for classification. It assumes that features follow a Gaussian (normal) distribution within each class and uses Bayes’ theorem with the “naive” assumption of conditional independence between features.
Constructor
import { GaussianNB } from '@scikitjs/sklearn';
const classifier = new GaussianNB({
varSmoothing: 1e-9
});
Parameters
Portion of the largest variance of all features that is added to variances for calculation stability. Must be non-negative.
Methods
fit()
Fit the Gaussian Naive Bayes classifier from the training dataset.
fit(X: Matrix, y: Vector, sampleWeight?: Vector): this
Training data matrix where each row is a sample and each column is a feature.
Target class labels for the training data.
Sample weights (currently not implemented but reserved for future use).
Returns: this - The fitted classifier instance.
Throws:
- Error if fewer than 2 classes are present
- Error if any class is missing from the training data
- Error if input validation fails
predict()
Perform classification on an array of test samples.
predict(X: Matrix): Vector
Returns: Vector - Predicted class labels.
predictProba()
Return probability estimates for the test samples.
predictProba(X: Matrix): Matrix
Returns: Matrix - Probability of each class for each sample. Each row represents a sample, and each column represents a class probability.
score()
Return the mean accuracy on the given test data and labels.
score(X: Matrix, y: Vector): number
True labels for the test samples.
Returns: number - Mean accuracy score.
Attributes
Unique class labels identified during training.
Probability of each class (prior probabilities).
Mean of each feature per class.
Variance of each feature per class.
Examples
Basic Classification
import { GaussianNB } from '@scikitjs/sklearn';
// Training data
const X = [
[-1, -1],
[-2, -1],
[-3, -2],
[1, 1],
[2, 1],
[3, 2]
];
const y = [0, 0, 0, 1, 1, 1];
// Create and train classifier
const gnb = new GaussianNB();
gnb.fit(X, y);
// Predict new samples
const predictions = gnb.predict([[-0.8, -1], [2.5, 1.5]]);
console.log(predictions); // [0, 1]
Multi-class Classification
import { GaussianNB } from '@scikitjs/sklearn';
// Iris-like dataset with 3 classes
const X = [
[5.1, 3.5], [4.9, 3.0], [4.7, 3.2], // Class 0
[7.0, 3.2], [6.4, 3.2], [6.9, 3.1], // Class 1
[6.3, 3.3], [5.8, 2.7], [6.1, 3.0] // Class 2
];
const y = [0, 0, 0, 1, 1, 1, 2, 2, 2];
const gnb = new GaussianNB();
gnb.fit(X, y);
// Predict with probability estimates
const testSamples = [[5.0, 3.4], [6.5, 3.2], [6.0, 3.0]];
const predictions = gnb.predict(testSamples);
const probabilities = gnb.predictProba(testSamples);
console.log('Predictions:', predictions);
console.log('Probabilities:', probabilities);
Probability Estimates
import { GaussianNB } from '@scikitjs/sklearn';
const X = [
[0, 0], [1, 1], [2, 2],
[10, 10], [11, 11], [12, 12]
];
const y = [0, 0, 0, 1, 1, 1];
const gnb = new GaussianNB();
gnb.fit(X, y);
// Get probability estimates
const proba = gnb.predictProba([[1, 1], [10, 10], [6, 6]]);
console.log(proba);
// [
// [0.95, 0.05], // Very likely class 0
// [0.05, 0.95], // Very likely class 1
// [0.50, 0.50] // Uncertain between classes
// ]
Using Variance Smoothing
import { GaussianNB } from '@scikitjs/sklearn';
// Dataset with very low variance in some features
const X = [
[1.0, 0.001], [1.0, 0.002], [1.0, 0.001],
[2.0, 0.001], [2.0, 0.002], [2.0, 0.001]
];
const y = [0, 0, 0, 1, 1, 1];
// Higher variance smoothing for numerical stability
const gnb = new GaussianNB({ varSmoothing: 1e-6 });
gnb.fit(X, y);
const predictions = gnb.predict([[1.0, 0.0015], [2.0, 0.0015]]);
console.log(predictions); // [0, 1]
Model Evaluation
import { GaussianNB } from '@scikitjs/sklearn';
// Training data
const XTrain = [
[1, 2], [2, 3], [3, 4],
[6, 7], [7, 8], [8, 9]
];
const yTrain = [0, 0, 0, 1, 1, 1];
// Test data
const XTest = [[2, 2], [7, 7]];
const yTest = [0, 1];
const gnb = new GaussianNB();
gnb.fit(XTrain, yTrain);
// Calculate accuracy
const accuracy = gnb.score(XTest, yTest);
console.log(`Accuracy: ${accuracy}`); // 1.0 (100%)
// Inspect learned parameters
console.log('Class priors:', gnb.classPrior_);
console.log('Feature means:', gnb.theta_);
console.log('Feature variances:', gnb.var_);
Text Classification Example
import { GaussianNB } from '@scikitjs/sklearn';
// Simple bag-of-words features for spam detection
// Features: [count('free'), count('money'), count('meeting'), avgWordLength]
const X = [
[3, 2, 0, 4.2], // spam
[2, 3, 0, 4.5], // spam
[0, 0, 2, 6.1], // not spam
[0, 1, 3, 6.8], // not spam
[4, 1, 0, 3.9], // spam
[0, 0, 1, 7.2] // not spam
];
const y = [1, 1, 0, 0, 1, 0]; // 1 = spam, 0 = not spam
const gnb = new GaussianNB();
gnb.fit(X, y);
// Classify new message
const newMessage = [[2, 1, 0, 4.0]];
const prediction = gnb.predict(newMessage);
const probability = gnb.predictProba(newMessage);
console.log('Is spam:', prediction[0] === 1);
console.log('Spam probability:', probability[0][1]);
Notes
- Assumes features are conditionally independent given the class (“naive” assumption)
- Assumes features follow a Gaussian (normal) distribution
- Fast training and prediction
- Works well with high-dimensional data
- Requires relatively small amount of training data to estimate parameters
- The
varSmoothing parameter helps prevent numerical instability when a feature has very low variance
- Particularly effective for text classification and real-valued features