Overview
SelectKBest selects the K highest scoring features according to a scoring function. Commonly used scoring functions include chi2, f_classif, and mutual_info_classif.
Constructor
new SelectKBest(scoreFunc, options?)
Parameters
Scoring function that takes (X, y) and returns scores and p-values.
Common choices: chi2, f_classif, f_regression, mutualInfoClassif.
options.k
number | 'all'
default:"10"
Number of top features to select. Use 'all' to keep all features.
Methods
fit()
fit(X: Matrix, y: Vector): this
Run score function on (X, y) and select features.
transform(X: Matrix): Matrix
Reduce X to selected features.
fitTransform(X: Matrix, y: Vector): Matrix
Fit and transform in one step.
Properties
P-values of feature scores (if supported by scoring function).
Indices of selected features.
Examples
Classification with chi2
import { SelectKBest, chi2, LogisticRegression } from "bun-scikit";
const selector = new SelectKBest(chi2, { k: 5 });
const XNew = selector.fitTransform(XTrain, yTrain);
console.log("Selected features:", selector.selectedFeatureIndices_);
console.log("Feature scores:", selector.scores_);
const model = new LogisticRegression();
model.fit(XNew, yTrain);
Regression with f_regression
import { SelectKBest, f_regression, LinearRegression } from "bun-scikit";
const selector = new SelectKBest(f_regression, { k: 3 });
const XTrain_selected = selector.fitTransform(XTrain, yTrain);
const XTest_selected = selector.transform(XTest);
const model = new LinearRegression();
model.fit(XTrain_selected, yTrain);
const predictions = model.predict(XTest_selected);
Use in pipeline
import { Pipeline, SelectKBest, f_classif, StandardScaler, SVC } from "bun-scikit";
const pipeline = new Pipeline([
["feature_selection", new SelectKBest(f_classif, { k: 20 })],
["scaler", new StandardScaler()],
["svm", new SVC({ kernel: "rbf" })]
]);
pipeline.fit(XTrain, yTrain);
const score = pipeline.score(XTest, yTest);
Notes
- Univariate feature selection examines each feature independently
- Does not account for feature interactions
- Fast and scalable to high-dimensional datasets
- chi2 requires non-negative features (e.g., count data, TF-IDF)
- f_classif and f_regression are suitable for continuous features