Skip to main content

Overview

Model selection involves choosing the best model and hyperparameters for your data. bun-scikit provides comprehensive tools for cross-validation, hyperparameter tuning, and performance evaluation.

Cross-Validation

Evaluate model performance reliably

Grid Search

Exhaustive hyperparameter search

Random Search

Efficient parameter optimization

Learning Curves

Diagnose bias and variance

Cross-Validation

Cross-validation splits data into multiple folds to evaluate model performance more reliably than a single train-test split.

Basic Cross-Validation

import { crossValScore } from "bun-scikit";
import { LogisticRegression } from "bun-scikit";

const X = [
  [0, 0], [0, 1], [1, 0], [1, 1],
  [2, 2], [2, 3], [3, 2], [3, 3],
];
const y = [0, 0, 0, 0, 1, 1, 1, 1];

// Evaluate model using 5-fold cross-validation
const scores = crossValScore(
  () => new LogisticRegression(),
  X,
  y,
  { cv: 5, scoring: "accuracy" }
);

console.log("Scores:", scores);
const meanScore = scores.reduce((a, b) => a + b) / scores.length;
console.log(`Mean accuracy: ${meanScore.toFixed(4)}`);

Configuration Options

cv
number | CrossValSplitter
default:"5"
Number of folds or a custom splitter object
scoring
BuiltInScoring | ScoringFn
default:"default"
Scoring metric: “accuracy”, “f1”, “precision”, “recall”, “r2”, “mean_squared_error”, or custom function
groups
Vector
default:"undefined"
Group labels for grouped cross-validation
sampleWeight
Vector
default:"undefined"
Sample weights for training

Cross-Validation Splitters

K-Fold

import { KFold, crossValScore } from "bun-scikit";
import { Ridge } from "bun-scikit";

const kfold = new KFold({
  nSplits: 5,
  shuffle: true,
  randomState: 42,
});

const scores = crossValScore(
  () => new Ridge({ alpha: 1.0 }),
  X,
  y,
  { cv: kfold, scoring: "r2" }
);

Stratified K-Fold

Preserves class distribution in each fold:
import { StratifiedKFold } from "bun-scikit";

const stratified = new StratifiedKFold({
  nSplits: 5,
  shuffle: true,
  randomState: 42,
});

const scores = crossValScore(
  () => new LogisticRegression(),
  X,
  y,
  { cv: stratified, scoring: "f1" }
);

Group K-Fold

Ensures samples from the same group stay together:
import { GroupKFold } from "bun-scikit";

const groups = [0, 0, 1, 1, 2, 2, 3, 3]; // Group labels

const groupKfold = new GroupKFold({ nSplits: 4 });

const scores = crossValScore(
  () => new Ridge(),
  X,
  y,
  { cv: groupKfold, groups, scoring: "r2" }
);

Time Series Split

import { TimeSeriesSplit } from "bun-scikit";

const tscv = new TimeSeriesSplit({ nSplits: 5 });

const scores = crossValScore(
  () => new Ridge(),
  X,
  y,
  { cv: tscv, scoring: "r2" }
);

Cross-Validate with Multiple Metrics

import { crossValidate } from "bun-scikit";

const results = crossValidate(
  () => new LogisticRegression(),
  X,
  y,
  {
    cv: 5,
    scoring: ["accuracy", "precision", "recall", "f1"],
    returnTrainScore: true,
  }
);

console.log("Test scores:", results.testScores);
console.log("Train scores:", results.trainScores);
console.log("Fit times:", results.fitTime);
console.log("Score times:", results.scoreTime);

Cross-Val Predict

Get predictions for each sample when it was in the test set:
import { crossValPredict } from "bun-scikit";

const predictions = crossValPredict(
  () => new LogisticRegression(),
  X,
  y,
  { cv: 5 }
);

// Now you can compute metrics on the predictions
import { accuracyScore, confusionMatrix } from "bun-scikit";

const accuracy = accuracyScore(y, predictions);
const cm = confusionMatrix(y, predictions);

console.log("Accuracy:", accuracy);
console.log("Confusion matrix:", cm);
Grid search exhaustively searches over a grid of hyperparameters to find the best combination.
import { GridSearchCV } from "bun-scikit";
import { RandomForestClassifier } from "bun-scikit";

const X = [
  [0, 0], [0, 1], [1, 0], [1, 1],
  [2, 2], [2, 3], [3, 2], [3, 3],
];
const y = [0, 0, 0, 0, 1, 1, 1, 1];

const search = new GridSearchCV(
  (params) => new RandomForestClassifier({
    nEstimators: params.nEstimators as number,
    maxDepth: params.maxDepth as number,
    maxFeatures: params.maxFeatures as "sqrt" | "log2",
  }),
  {
    nEstimators: [50, 100, 200],
    maxDepth: [5, 10, 20],
    maxFeatures: ["sqrt", "log2"],
  },
  {
    cv: 5,
    scoring: "accuracy",
    refit: true,
  }
);

search.fit(X, y);

console.log("Best parameters:", search.bestParams_);
console.log("Best score:", search.bestScore_);
console.log("Best estimator:", search.bestEstimator_);

// Use the best model
const predictions = search.predict([[1.5, 1.5]]);
console.log("Prediction:", predictions);

Configuration

cv
number | CrossValSplitter
default:"5"
Cross-validation strategy
scoring
BuiltInScoring | ScoringFn
default:"default"
Metric to optimize
refit
boolean
default:"true"
Whether to refit the best estimator on the entire dataset
errorScore
'raise' | number
default:"'raise'"
Value to assign if an error occurs during fitting. Use a number (e.g., -1) to continue despite errors.

Analyzing Results

search.fit(X, y);

// Get detailed results for all parameter combinations
for (const result of search.cvResults_) {
  console.log("Params:", result.params);
  console.log("Mean score:", result.meanTestScore);
  console.log("Std score:", result.stdTestScore);
  console.log("Rank:", result.rank);
  console.log("---");
}

// Sort by score
const sorted = search.cvResults_
  .slice()
  .sort((a, b) => b.meanTestScore - a.meanTestScore);

console.log("Top 3 configurations:");
for (const result of sorted.slice(0, 3)) {
  console.log(result.params, result.meanTestScore);
}

Nested Cross-Validation

// Outer CV for unbiased performance estimation
const outerScores = crossValScore(
  () => {
    // Inner grid search
    return new GridSearchCV(
      (params) => new Ridge({ alpha: params.alpha as number }),
      { alpha: [0.1, 1.0, 10.0] },
      { cv: 3, refit: true }
    );
  },
  X,
  y,
  { cv: 5, scoring: "r2" }
);

const avgScore = outerScores.reduce((a, b) => a + b) / outerScores.length;
console.log(`Unbiased performance estimate: ${avgScore.toFixed(4)}`);
Randomized search samples parameter combinations randomly, which is more efficient for large parameter spaces.

Basic Usage

import { RandomizedSearchCV } from "bun-scikit";
import { GradientBoostingClassifier } from "bun-scikit";

const search = new RandomizedSearchCV(
  (params) => new GradientBoostingClassifier({
    nEstimators: params.nEstimators as number,
    learningRate: params.learningRate as number,
    maxDepth: params.maxDepth as number,
  }),
  {
    nEstimators: [50, 100, 150, 200, 250, 300],
    learningRate: [0.01, 0.05, 0.1, 0.2, 0.3],
    maxDepth: [3, 5, 7, 9, 11],
  },
  {
    nIter: 20,  // Try 20 random combinations
    cv: 5,
    scoring: "accuracy",
    randomState: 42,
  }
);

search.fit(X, y);

console.log("Best parameters:", search.bestParams_);
console.log("Best score:", search.bestScore_);

Configuration

nIter
number
default:"10"
Number of parameter combinations to sample
randomState
number
default:"undefined"
Random seed for reproducibility
Use RandomizedSearchCV when:
  • Parameter space is very large (> 100 combinations)
  • Not all parameters are equally important
  • You have limited computational resources

Learning Curves

Learning curves help diagnose whether a model suffers from high bias or high variance.

Basic Usage

import { learningCurve } from "bun-scikit";
import { Ridge } from "bun-scikit";

const result = learningCurve(
  () => new Ridge({ alpha: 1.0 }),
  X,
  y,
  {
    trainSizes: [0.1, 0.3, 0.5, 0.7, 0.9],
    cv: 5,
    scoring: "r2",
  }
);

console.log("Train sizes:", result.trainSizes);
console.log("Train scores:", result.trainScores);
console.log("Test scores:", result.testScores);

// Compute means for plotting
for (let i = 0; i < result.trainSizes.length; i++) {
  const trainMean = result.trainScores[i].reduce((a, b) => a + b) / result.trainScores[i].length;
  const testMean = result.testScores[i].reduce((a, b) => a + b) / result.testScores[i].length;
  
  console.log(`Size ${result.trainSizes[i]}: train=${trainMean.toFixed(4)}, test=${testMean.toFixed(4)}`);
}

Interpreting Learning Curves

  • Both train and test scores are low
  • Scores converge quickly
  • Small gap between train and test scores
Solution: Use a more complex model or add features
  • Large gap between train and test scores
  • Train score is high, test score is low
  • Gap doesn’t close with more data
Solution: Add regularization, reduce model complexity, or get more data
  • Both scores are high
  • Small gap between train and test
  • Scores plateau with more data

Validation Curves

Validation curves show how a single hyperparameter affects performance.

Basic Usage

import { validationCurve } from "bun-scikit";
import { Ridge } from "bun-scikit";

const alphas = [0.001, 0.01, 0.1, 1.0, 10.0, 100.0];

const result = validationCurve(
  (alpha) => new Ridge({ alpha: alpha as number }),
  "alpha",
  alphas,
  X,
  y,
  {
    cv: 5,
    scoring: "r2",
  }
);

console.log("Parameter values:", result.paramValues);
console.log("Train scores:", result.trainScores);
console.log("Test scores:", result.testScores);

// Find optimal alpha
let bestIdx = 0;
let bestScore = -Infinity;

for (let i = 0; i < alphas.length; i++) {
  const meanTestScore = result.testScores[i].reduce((a, b) => a + b) / result.testScores[i].length;
  if (meanTestScore > bestScore) {
    bestScore = meanTestScore;
    bestIdx = i;
  }
}

console.log(`Best alpha: ${alphas[bestIdx]} (score: ${bestScore.toFixed(4)})`);

Train-Test Split

Simple holdout validation:
import { trainTestSplit } from "bun-scikit";

const [X_train, X_test, y_train, y_test] = trainTestSplit(
  X,
  y,
  {
    testSize: 0.2,
    randomState: 42,
    shuffle: true,
    stratify: y, // Preserve class distribution
  }
);

const model = new LogisticRegression();
model.fit(X_train, y_train);

const score = model.score(X_test, y_test);
console.log(`Test accuracy: ${score.toFixed(4)}`);

Common Patterns

import { Pipeline } from "bun-scikit";
import { StandardScaler } from "bun-scikit";
import { PCA } from "bun-scikit";
import { LogisticRegression } from "bun-scikit";
import { GridSearchCV } from "bun-scikit";

const search = new GridSearchCV(
  (params) => new Pipeline([
    ["scaler", new StandardScaler()],
    ["pca", new PCA({ nComponents: params.nComponents as number })],
    ["classifier", new LogisticRegression({ l2: params.l2 as number })],
  ]),
  {
    nComponents: [5, 10, 20],
    l2: [0.01, 0.1, 1.0],
  },
  { cv: 5, scoring: "accuracy" }
);

search.fit(X, y);

Custom Scoring Function

import type { ScoringFn } from "bun-scikit";

const customScorer: ScoringFn = (estimator, X, y) => {
  const predictions = estimator.predict(X);
  
  // Custom metric: weighted accuracy
  let correct = 0;
  let total = 0;
  
  for (let i = 0; i < y.length; i++) {
    const weight = y[i] === 1 ? 2 : 1; // Weight positive class more
    if (predictions[i] === y[i]) correct += weight;
    total += weight;
  }
  
  return correct / total;
};

const scores = crossValScore(
  () => new LogisticRegression(),
  X,
  y,
  { cv: 5, scoring: customScorer }
);

Early Stopping with Validation Set

const [X_train, X_val, y_train, y_val] = trainTestSplit(
  X,
  y,
  { testSize: 0.2, randomState: 42 }
);

let bestScore = -Infinity;
let bestModel = null;
let patience = 5;
let noImprovement = 0;

for (let nEstimators = 50; nEstimators <= 500; nEstimators += 50) {
  const model = new RandomForestClassifier({ nEstimators });
  model.fit(X_train, y_train);
  
  const valScore = model.score(X_val, y_val);
  
  if (valScore > bestScore) {
    bestScore = valScore;
    bestModel = model;
    noImprovement = 0;
  } else {
    noImprovement++;
    if (noImprovement >= patience) {
      console.log(`Early stopping at ${nEstimators} estimators`);
      break;
    }
  }
}

console.log(`Best validation score: ${bestScore.toFixed(4)}`);

Performance Tips

  • Use RandomizedSearchCV instead of GridSearchCV
  • Reduce number of CV folds (e.g., 3 instead of 5)
  • Use coarse-to-fine search (broad search first, then narrow)
  • Parallelize if possible (future feature)
  • Always split data before any preprocessing
  • Use Pipeline to ensure preprocessing is fit only on training data
  • Never use test data for hyperparameter tuning
  • Use nested CV for unbiased performance estimates
  • Classification: accuracy, f1, precision, recall, roc_auc
  • Regression: r2, mean_squared_error, mean_absolute_error
  • Imbalanced data: f1, precision-recall, or custom weighted metrics

Best Practices

1

Split your data

Create train, validation, and test sets. Never touch the test set until final evaluation.
2

Choose evaluation metric

Select a metric that aligns with your business objective.
3

Establish baseline

Train a simple model to establish a performance baseline.
4

Hyperparameter tuning

Use cross-validation with grid or randomized search.
5

Learning curves

Diagnose bias/variance to guide next steps.
6

Final evaluation

Evaluate best model on held-out test set once.

Next Steps

Linear Models

Apply tuning to linear models

Tree Ensembles

Tune random forests and boosting

Build docs developers (and LLMs) love