Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/dronabopche/100-ML-AI-Project/llms.txt

Use this file to discover all available pages before exploring further.

Each section below documents one algorithm’s implementation: the class signature, the core NumPy logic extracted directly from the notebook, the dataset it’s evaluated on, and the hyperparameters you can tune. Reading the code alongside the math in the notebooks is the fastest way to build intuition for how each model actually works.
Algorithm: CART (Classification and Regression Trees) using Gini impurity. Recursively finds the feature-threshold pair that produces the lowest weighted impurity across both child nodes.Dataset: Iris (150 samples, 4 features, 3 classes) via sklearn.datasets.load_iris.Key hyperparameters: max_depth (controls overfitting), min_samples_split (minimum node size before attempting a split).
The tree is composed of Node objects. Internal nodes store a feature index and threshold; leaf nodes store a value (the majority class). _gini computes impurity for a label array. _best_split exhaustively searches every unique threshold for every feature. _grow recurses until max_depth or a pure node is reached.
Algorithm: Lloyd’s algorithm. Randomly initialises k centroids, then alternates between assigning each point to its nearest centroid and recomputing centroids as cluster means. Stops when centroid movement falls below tol.Dataset: Synthetic blobs via sklearn.datasets.make_blobs (400 samples, 4 true clusters).Key hyperparameters: k (number of clusters), max_iters (iteration cap), tol (convergence threshold).
_assign computes Euclidean distances from every point to every centroid and returns the argmin. _inertia sums squared distances within each cluster (used for the elbow method). Centroids that lose all members retain their previous position to avoid NaN.
Algorithm: Lazy learner — stores the entire training set and defers computation to prediction time. For each test point, computes Euclidean distance to every training point, selects the k smallest, and returns the majority class via np.bincount.Dataset: Small custom 2D dataset demonstrating binary classification.Key hyperparameters: k (number of neighbors; smaller values = higher variance, larger values = higher bias).
fit only stores X_train and y_train — no training occurs. _predict_one handles a single query point; predict vectorises over the test set. Distance is Euclidean (sqrt(sum((x1-x2)^2))).
Algorithm: Two variants are implemented — gradient descent (iterative) and Ordinary Least Squares via the normal equation (closed-form). The gradient descent version initialises weights to zero and updates them each iteration using the mean squared error gradient.Dataset: Toy 1D dataset (X = [1,2,3,4,5], y = [2,4,6,8,10]).Key hyperparameters (GD): learning_rate, n_iter. OLS has no hyperparameters.
LinearRegression uses gradient descent: dw = (1/m) X^T (ŷ - y) and db = (1/m) sum(ŷ - y). LinearRegressionOLS solves θ = (X^T X)^{-1} X^T y directly using np.linalg.inv.
Algorithm: Binary classifier trained with gradient descent on binary cross-entropy loss. A sigmoid activation squashes the linear output to a probability; predictions are thresholded at 0.5 by default.Dataset: Toy 2D dataset (X = [[1,2],[2,3],[3,4],[4,5]], y = [0,0,1,1]).Key hyperparameters: learning_rate, n_iter, threshold (prediction cut-off, adjustable at inference time).
_sigmoid(z) = 1 / (1 + exp(-z)). Gradients follow the same form as linear regression because the cross-entropy gradient simplifies to (ŷ - y). get_probabilities exposes raw sigmoid outputs; predict applies the threshold.
Algorithm: Gaussian Naive Bayes. During fit, the prior probability, per-feature mean, and per-feature variance are computed for each class. At prediction time, the log posterior is computed as log P(y) + sum log P(x_i | y) using the Gaussian PDF, and the class with the highest score wins.Dataset: Iris via sklearn.datasets.load_iris (80/20 train-test split).Key hyperparameters: None — the model is fully determined by the training data. A small constant (1e-9) is added to variances to prevent division by zero.
Working in log space (_log_likelihood) avoids numeric underflow from multiplying many small probabilities. The Gaussian log-likelihood for feature i under class c is -0.5 * (log(2πv) + (x-m)²/v). Summing over features exploits the conditional independence assumption.
Algorithm: Two-layer feedforward network: Input → Hidden (ReLU) → Output (Softmax). Trained with mini-batch gradient descent and cross-entropy loss. Weights are initialised with Xavier initialisation.Dataset: Digits dataset via sklearn.datasets.load_digits (1797 samples, 64 features, 10 classes).Key hyperparameters: hidden_dim, lr (learning rate), epochs, batch_size.
forward computes Z1→A1 (ReLU) → Z2→A2 (Softmax). backward derives gradients analytically: dZ2 = A2 - y_true (softmax + cross-entropy simplification), then chain rule back through the hidden layer via relu_grad. fit shuffles indices each epoch for stochastic mini-batching.
Algorithm: Principal Component Analysis via eigen-decomposition of the sample covariance matrix. Data is mean-centered, the covariance matrix is formed, and np.linalg.eigh decomposes it. The top n_components eigenvectors form the projection matrix.Dataset: Iris (2D projection) and Digits (reconstruction at various component counts), both via scikit-learn.Key hyperparameters: n_components (number of principal components to retain).
fit stores the mean and the top-k eigenvectors as self.components (shape [n_components, n_features]). transform centers new data and projects it: (X - mean) @ components.T. explained_variance_ratio_ exposes the fraction of variance captured by each component.
Algorithm: Ensemble of n_trees decision trees. Each tree is trained on a bootstrap sample (sampling with replacement) and restricted to a random subset of max_features features at each split. Final predictions are determined by majority vote across all trees.Dataset: Breast Cancer Wisconsin via sklearn.datasets.load_breast_cancer (569 samples, 30 features, binary labels).Key hyperparameters: n_trees, max_depth, max_features ('sqrt' uses sqrt(d) features per split; 'log2' uses log2(d)).
The notebook re-implements DecisionTree with an added max_features parameter for feature subsampling inside _best_split. RandomForest.fit loops over trees, draws a bootstrap index with np.random.choice(..., replace=True), trains a tree, and appends it to self.trees. predict stacks predictions from all trees and takes the mode column-wise.
Algorithm: Hard-margin linear SVM trained with sub-gradient descent on the hinge loss: L = λ‖w‖² + (1/n) Σ max(0, 1 − y_i(w^T x_i + b)). Labels are re-coded to ±1 internally.Dataset: Synthetic binary classification via sklearn.datasets.make_classification (500 samples, 2 features), standardised with StandardScaler.Key hyperparameters: lr (learning rate), lambda_param (regularisation strength), n_iters.
At each iteration, margins = y * (Xw + b) identifies support vectors (margins < 1). Sub-gradients are computed only for those violating samples: dw = 2λw - mean(X[mask] * y[mask]), db = -mean(y[mask]). The loss history is tracked in self.losses for convergence plots.

Build docs developers (and LLMs) love