Skip to main content

Function Signature

function trainTestSplit<TX, TY>(
  X: TX[],
  y: TY[],
  options?: TrainTestSplitOptions
): TrainTestSplitResult<TX, TY>

Parameters

X
TX[]
required
Feature array to split
y
TY[]
required
Target array to split
options
TrainTestSplitOptions
Configuration options for the split

Returns

TrainTestSplitResult
object

Description

Split arrays or matrices into random train and test subsets. This is a fundamental utility for model validation, allowing you to reserve a portion of your data for testing while training on the remainder. The function uses a seeded random number generator (Mulberry32) to ensure reproducible splits when the same randomState is provided.

Example

import { trainTestSplit } from 'bun-scikit';

// Prepare data
const X = [
  [1, 2],
  [3, 4],
  [5, 6],
  [7, 8],
  [9, 10],
  [11, 12],
  [13, 14],
  [15, 16]
];
const y = [0, 0, 0, 0, 1, 1, 1, 1];

// Split with default 75/25 ratio
const { XTrain, XTest, yTrain, yTest } = trainTestSplit(X, y);

console.log('Training samples:', XTrain.length); // 6
console.log('Test samples:', XTest.length);       // 2

// Custom split with 80/20 ratio
const split = trainTestSplit(X, y, {
  testSize: 0.2,
  shuffle: true,
  randomState: 123
});

// Split with absolute test size
const fixedSplit = trainTestSplit(X, y, {
  testSize: 3,  // Exactly 3 test samples
  shuffle: false
});

Notes

  • Both X and y must have the same length
  • At least 2 samples are required for splitting
  • When testSize is a float (0 < testSize < 1), it represents the proportion of the dataset
  • When testSize is an integer, it represents the absolute number of test samples
  • Setting shuffle=false with no testSize will use the last 25% of samples for testing
  • The randomState parameter ensures reproducible results across multiple runs

Build docs developers (and LLMs) love