trainTestSplit

Auto-generate your docs

Function Signature
Parameters
Returns
Description
Example
Notes

Function Signature

function trainTestSplit<TX, TY>(
  X: TX[],
  y: TY[],
  options?: TrainTestSplitOptions
): TrainTestSplitResult<TX, TY>

Parameters

TX[]

required

Feature array to split

TY[]

required

Target array to split

options

TrainTestSplitOptions

Configuration options for the split

Show properties

testSize

number

default:"0.25"

Proportion of the dataset to include in the test split. Can be a float between 0 and 1, or an integer representing the absolute number of test samples.

shuffle

boolean

default:"true"

Whether to shuffle the data before splitting

randomState

number

default:"42"

Random seed for reproducible shuffling

Returns

TrainTestSplitResult

object

Show properties

XTrain

TX[]

Training feature subset

XTest

TX[]

Test feature subset

yTrain

TY[]

Training target subset

yTest

TY[]

Test target subset

Description

Split arrays or matrices into random train and test subsets. This is a fundamental utility for model validation, allowing you to reserve a portion of your data for testing while training on the remainder. The function uses a seeded random number generator (Mulberry32) to ensure reproducible splits when the same randomState is provided.

Example

import { trainTestSplit } from 'bun-scikit';

// Prepare data
const X = [
  [1, 2],
  [3, 4],
  [5, 6],
  [7, 8],
  [9, 10],
  [11, 12],
  [13, 14],
  [15, 16]
];
const y = [0, 0, 0, 0, 1, 1, 1, 1];

// Split with default 75/25 ratio
const { XTrain, XTest, yTrain, yTest } = trainTestSplit(X, y);

console.log('Training samples:', XTrain.length); // 6
console.log('Test samples:', XTest.length);       // 2

// Custom split with 80/20 ratio
const split = trainTestSplit(X, y, {
  testSize: 0.2,
  shuffle: true,
  randomState: 123
});

// Split with absolute test size
const fixedSplit = trainTestSplit(X, y, {
  testSize: 3,  // Exactly 3 test samples
  shuffle: false
});

Notes

Both X and y must have the same length
At least 2 samples are required for splitting
When testSize is a float (0 < testSize < 1), it represents the proportion of the dataset
When testSize is an integer, it represents the absolute number of test samples
Setting shuffle=false with no testSize will use the last 25% of samples for testing
The randomState parameter ensures reproducible results across multiple runs

SimpleImputer

KFold

⌘I

Build docs developers (and LLMs) love

Get started for free Talk to us

Linear Models

Tree & Ensemble

Neighbors & Naive Bayes

SVM

Clustering

Decomposition

Manifold Learning

Preprocessing

Model Selection

Metrics

Pipeline & Composition

Meta-Estimators

Feature Selection

Function Signature

Parameters

Returns

Description

Example

Notes

Build docs developers (and LLMs) love

Linear Models

Tree & Ensemble

Neighbors & Naive Bayes

SVM

Clustering

Decomposition

Manifold Learning

Preprocessing

Model Selection

Metrics

Pipeline & Composition

Meta-Estimators

Feature Selection

Documentation Index

​Function Signature

​Parameters

​Returns

​Description

​Example

​Notes

Build docs developers (and LLMs) love

Function Signature

Parameters

Returns

Description

Example

Notes