Model Selection & Validation

Overview

Model selection involves choosing the best model and hyperparameters for your data. bun-scikit provides comprehensive tools for cross-validation, hyperparameter tuning, and performance evaluation.

Cross-Validation

Evaluate model performance reliably

Grid Search

Exhaustive hyperparameter search

Random Search

Efficient parameter optimization

Learning Curves

Diagnose bias and variance

Cross-Validation

Cross-validation splits data into multiple folds to evaluate model performance more reliably than a single train-test split.

Basic Cross-Validation

import { crossValScore } from "bun-scikit";
import { LogisticRegression } from "bun-scikit";

const X = [
  [0, 0], [0, 1], [1, 0], [1, 1],
  [2, 2], [2, 3], [3, 2], [3, 3],
];
const y = [0, 0, 0, 0, 1, 1, 1, 1];

// Evaluate model using 5-fold cross-validation
const scores = crossValScore(
  () => new LogisticRegression(),
  X,
  y,
  { cv: 5, scoring: "accuracy" }
);

console.log("Scores:", scores);
const meanScore = scores.reduce((a, b) => a + b) / scores.length;
console.log(`Mean accuracy: ${meanScore.toFixed(4)}`);

Configuration Options

number | CrossValSplitter

default:"5"

Number of folds or a custom splitter object

scoring

BuiltInScoring | ScoringFn

default:"default"

Scoring metric: “accuracy”, “f1”, “precision”, “recall”, “r2”, “mean_squared_error”, or custom function

groups

Vector

default:"undefined"

Group labels for grouped cross-validation

sampleWeight

Vector

default:"undefined"

Sample weights for training

Cross-Validation Splitters

K-Fold

import { KFold, crossValScore } from "bun-scikit";
import { Ridge } from "bun-scikit";

const kfold = new KFold({
  nSplits: 5,
  shuffle: true,
  randomState: 42,
});

const scores = crossValScore(
  () => new Ridge({ alpha: 1.0 }),
  X,
  y,
  { cv: kfold, scoring: "r2" }
);

Stratified K-Fold

Preserves class distribution in each fold:

import { StratifiedKFold } from "bun-scikit";

const stratified = new StratifiedKFold({
  nSplits: 5,
  shuffle: true,
  randomState: 42,
});

const scores = crossValScore(
  () => new LogisticRegression(),
  X,
  y,
  { cv: stratified, scoring: "f1" }
);

Group K-Fold

Ensures samples from the same group stay together:

import { GroupKFold } from "bun-scikit";

const groups = [0, 0, 1, 1, 2, 2, 3, 3]; // Group labels

const groupKfold = new GroupKFold({ nSplits: 4 });

const scores = crossValScore(
  () => new Ridge(),
  X,
  y,
  { cv: groupKfold, groups, scoring: "r2" }
);

Time Series Split

import { TimeSeriesSplit } from "bun-scikit";

const tscv = new TimeSeriesSplit({ nSplits: 5 });

const scores = crossValScore(
  () => new Ridge(),
  X,
  y,
  { cv: tscv, scoring: "r2" }
);

Cross-Validate with Multiple Metrics

import { crossValidate } from "bun-scikit";

const results = crossValidate(
  () => new LogisticRegression(),
  X,
  y,
  {
    cv: 5,
    scoring: ["accuracy", "precision", "recall", "f1"],
    returnTrainScore: true,
  }
);

console.log("Test scores:", results.testScores);
console.log("Train scores:", results.trainScores);
console.log("Fit times:", results.fitTime);
console.log("Score times:", results.scoreTime);

Cross-Val Predict

Get predictions for each sample when it was in the test set:

import { crossValPredict } from "bun-scikit";

const predictions = crossValPredict(
  () => new LogisticRegression(),
  X,
  y,
  { cv: 5 }
);

// Now you can compute metrics on the predictions
import { accuracyScore, confusionMatrix } from "bun-scikit";

const accuracy = accuracyScore(y, predictions);
const cm = confusionMatrix(y, predictions);

console.log("Accuracy:", accuracy);
console.log("Confusion matrix:", cm);

Grid Search

Grid search exhaustively searches over a grid of hyperparameters to find the best combination.

Basic Grid Search

import { GridSearchCV } from "bun-scikit";
import { RandomForestClassifier } from "bun-scikit";

const X = [
  [0, 0], [0, 1], [1, 0], [1, 1],
  [2, 2], [2, 3], [3, 2], [3, 3],
];
const y = [0, 0, 0, 0, 1, 1, 1, 1];

const search = new GridSearchCV(
  (params) => new RandomForestClassifier({
    nEstimators: params.nEstimators as number,
    maxDepth: params.maxDepth as number,
    maxFeatures: params.maxFeatures as "sqrt" | "log2",
  }),
  {
    nEstimators: [50, 100, 200],
    maxDepth: [5, 10, 20],
    maxFeatures: ["sqrt", "log2"],
  },
  {
    cv: 5,
    scoring: "accuracy",
    refit: true,
  }
);

search.fit(X, y);

console.log("Best parameters:", search.bestParams_);
console.log("Best score:", search.bestScore_);
console.log("Best estimator:", search.bestEstimator_);

// Use the best model
const predictions = search.predict([[1.5, 1.5]]);
console.log("Prediction:", predictions);

Configuration

number | CrossValSplitter

default:"5"

Cross-validation strategy

scoring

BuiltInScoring | ScoringFn

default:"default"

Metric to optimize

refit

boolean

default:"true"

Whether to refit the best estimator on the entire dataset

errorScore

'raise' | number

default:"'raise'"

Value to assign if an error occurs during fitting. Use a number (e.g., -1) to continue despite errors.

Analyzing Results

search.fit(X, y);

// Get detailed results for all parameter combinations
for (const result of search.cvResults_) {
  console.log("Params:", result.params);
  console.log("Mean score:", result.meanTestScore);
  console.log("Std score:", result.stdTestScore);
  console.log("Rank:", result.rank);
  console.log("---");
}

// Sort by score
const sorted = search.cvResults_
  .slice()
  .sort((a, b) => b.meanTestScore - a.meanTestScore);

console.log("Top 3 configurations:");
for (const result of sorted.slice(0, 3)) {
  console.log(result.params, result.meanTestScore);
}

Nested Cross-Validation

// Outer CV for unbiased performance estimation
const outerScores = crossValScore(
  () => {
    // Inner grid search
    return new GridSearchCV(
      (params) => new Ridge({ alpha: params.alpha as number }),
      { alpha: [0.1, 1.0, 10.0] },
      { cv: 3, refit: true }
    );
  },
  X,
  y,
  { cv: 5, scoring: "r2" }
);

const avgScore = outerScores.reduce((a, b) => a + b) / outerScores.length;
console.log(`Unbiased performance estimate: ${avgScore.toFixed(4)}`);

Randomized Search

Randomized search samples parameter combinations randomly, which is more efficient for large parameter spaces.

Basic Usage

import { RandomizedSearchCV } from "bun-scikit";
import { GradientBoostingClassifier } from "bun-scikit";

const search = new RandomizedSearchCV(
  (params) => new GradientBoostingClassifier({
    nEstimators: params.nEstimators as number,
    learningRate: params.learningRate as number,
    maxDepth: params.maxDepth as number,
  }),
  {
    nEstimators: [50, 100, 150, 200, 250, 300],
    learningRate: [0.01, 0.05, 0.1, 0.2, 0.3],
    maxDepth: [3, 5, 7, 9, 11],
  },
  {
    nIter: 20,  // Try 20 random combinations
    cv: 5,
    scoring: "accuracy",
    randomState: 42,
  }
);

search.fit(X, y);

console.log("Best parameters:", search.bestParams_);
console.log("Best score:", search.bestScore_);

Configuration

nIter

number

default:"10"

Number of parameter combinations to sample

randomState

number

default:"undefined"

Random seed for reproducibility

Use RandomizedSearchCV when:

Parameter space is very large (> 100 combinations)
Not all parameters are equally important
You have limited computational resources

Learning Curves

Learning curves help diagnose whether a model suffers from high bias or high variance.

Basic Usage

import { learningCurve } from "bun-scikit";
import { Ridge } from "bun-scikit";

const result = learningCurve(
  () => new Ridge({ alpha: 1.0 }),
  X,
  y,
  {
    trainSizes: [0.1, 0.3, 0.5, 0.7, 0.9],
    cv: 5,
    scoring: "r2",
  }
);

console.log("Train sizes:", result.trainSizes);
console.log("Train scores:", result.trainScores);
console.log("Test scores:", result.testScores);

// Compute means for plotting
for (let i = 0; i < result.trainSizes.length; i++) {
  const trainMean = result.trainScores[i].reduce((a, b) => a + b) / result.trainScores[i].length;
  const testMean = result.testScores[i].reduce((a, b) => a + b) / result.testScores[i].length;
  
  console.log(`Size ${result.trainSizes[i]}: train=${trainMean.toFixed(4)}, test=${testMean.toFixed(4)}`);
}

Interpreting Learning Curves

High Bias (Underfitting)

Both train and test scores are low
Scores converge quickly
Small gap between train and test scores

Solution: Use a more complex model or add features

High Variance (Overfitting)

Large gap between train and test scores
Train score is high, test score is low
Gap doesn’t close with more data

Solution: Add regularization, reduce model complexity, or get more data

Good Fit

Both scores are high
Small gap between train and test
Scores plateau with more data

Validation Curves

Validation curves show how a single hyperparameter affects performance.

Basic Usage

import { validationCurve } from "bun-scikit";
import { Ridge } from "bun-scikit";

const alphas = [0.001, 0.01, 0.1, 1.0, 10.0, 100.0];

const result = validationCurve(
  (alpha) => new Ridge({ alpha: alpha as number }),
  "alpha",
  alphas,
  X,
  y,
  {
    cv: 5,
    scoring: "r2",
  }
);

console.log("Parameter values:", result.paramValues);
console.log("Train scores:", result.trainScores);
console.log("Test scores:", result.testScores);

// Find optimal alpha
let bestIdx = 0;
let bestScore = -Infinity;

for (let i = 0; i < alphas.length; i++) {
  const meanTestScore = result.testScores[i].reduce((a, b) => a + b) / result.testScores[i].length;
  if (meanTestScore > bestScore) {
    bestScore = meanTestScore;
    bestIdx = i;
  }
}

console.log(`Best alpha: ${alphas[bestIdx]} (score: ${bestScore.toFixed(4)})`);

Train-Test Split

Simple holdout validation:

import { trainTestSplit } from "bun-scikit";

const [X_train, X_test, y_train, y_test] = trainTestSplit(
  X,
  y,
  {
    testSize: 0.2,
    randomState: 42,
    shuffle: true,
    stratify: y, // Preserve class distribution
  }
);

const model = new LogisticRegression();
model.fit(X_train, y_train);

const score = model.score(X_test, y_test);
console.log(`Test accuracy: ${score.toFixed(4)}`);

Common Patterns

Pipeline with Grid Search

import { Pipeline } from "bun-scikit";
import { StandardScaler } from "bun-scikit";
import { PCA } from "bun-scikit";
import { LogisticRegression } from "bun-scikit";
import { GridSearchCV } from "bun-scikit";

const search = new GridSearchCV(
  (params) => new Pipeline([
    ["scaler", new StandardScaler()],
    ["pca", new PCA({ nComponents: params.nComponents as number })],
    ["classifier", new LogisticRegression({ l2: params.l2 as number })],
  ]),
  {
    nComponents: [5, 10, 20],
    l2: [0.01, 0.1, 1.0],
  },
  { cv: 5, scoring: "accuracy" }
);

search.fit(X, y);

Custom Scoring Function

import type { ScoringFn } from "bun-scikit";

const customScorer: ScoringFn = (estimator, X, y) => {
  const predictions = estimator.predict(X);
  
  // Custom metric: weighted accuracy
  let correct = 0;
  let total = 0;
  
  for (let i = 0; i < y.length; i++) {
    const weight = y[i] === 1 ? 2 : 1; // Weight positive class more
    if (predictions[i] === y[i]) correct += weight;
    total += weight;
  }
  
  return correct / total;
};

const scores = crossValScore(
  () => new LogisticRegression(),
  X,
  y,
  { cv: 5, scoring: customScorer }
);

Early Stopping with Validation Set

const [X_train, X_val, y_train, y_val] = trainTestSplit(
  X,
  y,
  { testSize: 0.2, randomState: 42 }
);

let bestScore = -Infinity;
let bestModel = null;
let patience = 5;
let noImprovement = 0;

for (let nEstimators = 50; nEstimators <= 500; nEstimators += 50) {
  const model = new RandomForestClassifier({ nEstimators });
  model.fit(X_train, y_train);
  
  const valScore = model.score(X_val, y_val);
  
  if (valScore > bestScore) {
    bestScore = valScore;
    bestModel = model;
    noImprovement = 0;
  } else {
    noImprovement++;
    if (noImprovement >= patience) {
      console.log(`Early stopping at ${nEstimators} estimators`);
      break;
    }
  }
}

console.log(`Best validation score: ${bestScore.toFixed(4)}`);

Performance Tips

Reducing Search Time

Use RandomizedSearchCV instead of GridSearchCV
Reduce number of CV folds (e.g., 3 instead of 5)
Use coarse-to-fine search (broad search first, then narrow)
Parallelize if possible (future feature)

Avoiding Data Leakage

Always split data before any preprocessing
Use Pipeline to ensure preprocessing is fit only on training data
Never use test data for hyperparameter tuning
Use nested CV for unbiased performance estimates

Choosing Metrics

Classification: accuracy, f1, precision, recall, roc_auc
Regression: r2, mean_squared_error, mean_absolute_error
Imbalanced data: f1, precision-recall, or custom weighted metrics

Best Practices

Split your data

Create train, validation, and test sets. Never touch the test set until final evaluation.

Choose evaluation metric

Select a metric that aligns with your business objective.

Establish baseline

Train a simple model to establish a performance baseline.

Hyperparameter tuning

Use cross-validation with grid or randomized search.

Learning curves

Diagnose bias/variance to guide next steps.

Final evaluation

Evaluate best model on held-out test set once.

Next Steps

Linear Models

Apply tuning to linear models

Tree Ensembles

Tune random forests and boosting

Get Started

Core Concepts

Guides

Performance

Documentation Index

​Overview

Cross-Validation

Grid Search

Random Search

Learning Curves

​Cross-Validation

​Basic Cross-Validation

​Configuration Options

​Cross-Validation Splitters

​K-Fold

​Stratified K-Fold

​Group K-Fold

​Time Series Split

​Cross-Validate with Multiple Metrics

​Cross-Val Predict

​Grid Search

​Basic Grid Search

​Configuration

​Analyzing Results

​Nested Cross-Validation

​Randomized Search

​Basic Usage

​Configuration

​Learning Curves

​Basic Usage

​Interpreting Learning Curves

​Validation Curves

​Basic Usage

​Train-Test Split

​Common Patterns

​Pipeline with Grid Search

​Custom Scoring Function

​Early Stopping with Validation Set

​Performance Tips

​Best Practices

​Next Steps

Linear Models

Tree Ensembles

Build docs developers (and LLMs) love

Overview

Cross-Validation

Basic Cross-Validation

Configuration Options

Cross-Validation Splitters

K-Fold

Stratified K-Fold

Group K-Fold

Time Series Split

Cross-Validate with Multiple Metrics

Cross-Val Predict

Grid Search

Basic Grid Search

Configuration

Analyzing Results

Nested Cross-Validation

Randomized Search

Basic Usage

Configuration

Learning Curves

Basic Usage

Interpreting Learning Curves

Validation Curves

Basic Usage

Train-Test Split

Common Patterns

Pipeline with Grid Search

Custom Scoring Function

Early Stopping with Validation Set

Performance Tips

Best Practices

Next Steps