Each section below documents one algorithm’s implementation: the class signature, the core NumPy logic extracted directly from the notebook, the dataset it’s evaluated on, and the hyperparameters you can tune. Reading the code alongside the math in the notebooks is the fastest way to build intuition for how each model actually works.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/dronabopche/100-ML-AI-Project/llms.txt
Use this file to discover all available pages before exploring further.
Decision Tree
Decision Tree
Algorithm: CART (Classification and Regression Trees) using Gini impurity. Recursively finds the feature-threshold pair that produces the lowest weighted impurity across both child nodes.Dataset: Iris (150 samples, 4 features, 3 classes) via
sklearn.datasets.load_iris.Key hyperparameters: max_depth (controls overfitting), min_samples_split (minimum node size before attempting a split).- Overview
- Code
The tree is composed of
Node objects. Internal nodes store a feature index and threshold; leaf nodes store a value (the majority class). _gini computes impurity for a label array. _best_split exhaustively searches every unique threshold for every feature. _grow recurses until max_depth or a pure node is reached.K-Means Clustering
K-Means Clustering
Algorithm: Lloyd’s algorithm. Randomly initialises
k centroids, then alternates between assigning each point to its nearest centroid and recomputing centroids as cluster means. Stops when centroid movement falls below tol.Dataset: Synthetic blobs via sklearn.datasets.make_blobs (400 samples, 4 true clusters).Key hyperparameters: k (number of clusters), max_iters (iteration cap), tol (convergence threshold).- Overview
- Code
_assign computes Euclidean distances from every point to every centroid and returns the argmin. _inertia sums squared distances within each cluster (used for the elbow method). Centroids that lose all members retain their previous position to avoid NaN.K-Nearest Neighbors
K-Nearest Neighbors
Algorithm: Lazy learner — stores the entire training set and defers computation to prediction time. For each test point, computes Euclidean distance to every training point, selects the
k smallest, and returns the majority class via np.bincount.Dataset: Small custom 2D dataset demonstrating binary classification.Key hyperparameters: k (number of neighbors; smaller values = higher variance, larger values = higher bias).- Overview
- Code
fit only stores X_train and y_train — no training occurs. _predict_one handles a single query point; predict vectorises over the test set. Distance is Euclidean (sqrt(sum((x1-x2)^2))).Linear Regression
Linear Regression
Algorithm: Two variants are implemented — gradient descent (iterative) and Ordinary Least Squares via the normal equation (closed-form). The gradient descent version initialises weights to zero and updates them each iteration using the mean squared error gradient.Dataset: Toy 1D dataset (
X = [1,2,3,4,5], y = [2,4,6,8,10]).Key hyperparameters (GD): learning_rate, n_iter. OLS has no hyperparameters.- Overview
- Code
LinearRegression uses gradient descent: dw = (1/m) X^T (ŷ - y) and db = (1/m) sum(ŷ - y). LinearRegressionOLS solves θ = (X^T X)^{-1} X^T y directly using np.linalg.inv.Logistic Regression
Logistic Regression
Algorithm: Binary classifier trained with gradient descent on binary cross-entropy loss. A sigmoid activation squashes the linear output to a probability; predictions are thresholded at 0.5 by default.Dataset: Toy 2D dataset (
X = [[1,2],[2,3],[3,4],[4,5]], y = [0,0,1,1]).Key hyperparameters: learning_rate, n_iter, threshold (prediction cut-off, adjustable at inference time).- Overview
- Code
_sigmoid(z) = 1 / (1 + exp(-z)). Gradients follow the same form as linear regression because the cross-entropy gradient simplifies to (ŷ - y). get_probabilities exposes raw sigmoid outputs; predict applies the threshold.Naive Bayes
Naive Bayes
Algorithm: Gaussian Naive Bayes. During
fit, the prior probability, per-feature mean, and per-feature variance are computed for each class. At prediction time, the log posterior is computed as log P(y) + sum log P(x_i | y) using the Gaussian PDF, and the class with the highest score wins.Dataset: Iris via sklearn.datasets.load_iris (80/20 train-test split).Key hyperparameters: None — the model is fully determined by the training data. A small constant (1e-9) is added to variances to prevent division by zero.- Overview
- Code
Working in log space (
_log_likelihood) avoids numeric underflow from multiplying many small probabilities. The Gaussian log-likelihood for feature i under class c is -0.5 * (log(2πv) + (x-m)²/v). Summing over features exploits the conditional independence assumption.Neural Network
Neural Network
Algorithm: Two-layer feedforward network: Input → Hidden (ReLU) → Output (Softmax). Trained with mini-batch gradient descent and cross-entropy loss. Weights are initialised with Xavier initialisation.Dataset: Digits dataset via
sklearn.datasets.load_digits (1797 samples, 64 features, 10 classes).Key hyperparameters: hidden_dim, lr (learning rate), epochs, batch_size.- Overview
- Code
forward computes Z1→A1 (ReLU) → Z2→A2 (Softmax). backward derives gradients analytically: dZ2 = A2 - y_true (softmax + cross-entropy simplification), then chain rule back through the hidden layer via relu_grad. fit shuffles indices each epoch for stochastic mini-batching.PCA
PCA
Algorithm: Principal Component Analysis via eigen-decomposition of the sample covariance matrix. Data is mean-centered, the covariance matrix is formed, and
np.linalg.eigh decomposes it. The top n_components eigenvectors form the projection matrix.Dataset: Iris (2D projection) and Digits (reconstruction at various component counts), both via scikit-learn.Key hyperparameters: n_components (number of principal components to retain).- Overview
- Code
fit stores the mean and the top-k eigenvectors as self.components (shape [n_components, n_features]). transform centers new data and projects it: (X - mean) @ components.T. explained_variance_ratio_ exposes the fraction of variance captured by each component.Random Forest
Random Forest
Algorithm: Ensemble of
n_trees decision trees. Each tree is trained on a bootstrap sample (sampling with replacement) and restricted to a random subset of max_features features at each split. Final predictions are determined by majority vote across all trees.Dataset: Breast Cancer Wisconsin via sklearn.datasets.load_breast_cancer (569 samples, 30 features, binary labels).Key hyperparameters: n_trees, max_depth, max_features ('sqrt' uses sqrt(d) features per split; 'log2' uses log2(d)).- Overview
- Code
The notebook re-implements
DecisionTree with an added max_features parameter for feature subsampling inside _best_split. RandomForest.fit loops over trees, draws a bootstrap index with np.random.choice(..., replace=True), trains a tree, and appends it to self.trees. predict stacks predictions from all trees and takes the mode column-wise.SVM
SVM
Algorithm: Hard-margin linear SVM trained with sub-gradient descent on the hinge loss:
L = λ‖w‖² + (1/n) Σ max(0, 1 − y_i(w^T x_i + b)). Labels are re-coded to ±1 internally.Dataset: Synthetic binary classification via sklearn.datasets.make_classification (500 samples, 2 features), standardised with StandardScaler.Key hyperparameters: lr (learning rate), lambda_param (regularisation strength), n_iters.- Overview
- Code
At each iteration,
margins = y * (Xw + b) identifies support vectors (margins < 1). Sub-gradients are computed only for those violating samples: dw = 2λw - mean(X[mask] * y[mask]), db = -mean(y[mask]). The loss history is tracked in self.losses for convergence plots.