Computer vision: CNN image classification projects

Convolutional Neural Networks (CNNs) have become the standard architecture for image recognition tasks. This section covers five projects that progress from a simple two-class classifier up to a full multi-class agricultural identification system built with an Artificial Neural Network (ANN). Each project demonstrates a distinct dataset, framework choice (TensorFlow/Keras or PyTorch), and classification scenario, giving a practical tour of the computer-vision landscape in the repository.

Deep learning training is computationally intensive. A GPU is strongly recommended for all five projects. Install the appropriate CUDA toolkit alongside your framework of choice:

TensorFlow – pip install tensorflow (GPU support via tensorflow[and-cuda] on Linux)
PyTorch – follow the official PyTorch installation guide and select the CUDA version matching your driver

Training on CPU is possible but will be significantly slower, especially for CIFAR-10 and food classification.

Project comparison

Project	Architecture	Classes	Dataset	Accuracy
Binary Image Classification (30)	Custom CNN	2	Binary image dataset (Kaggle)	—
Food Image Classification (31)	Custom CNN / Transfer Learning	Multi-class (food categories)	Food image dataset (Kaggle)	—
CIFAR-10 Classification (32)	Custom CNN (PyTorch)	10	CIFAR-10 (60 000 images)	—
MNIST Digit Classification (33)	CNN / Dense ANN	10	MNIST (70 000 grayscale images)	—
Date Fruit Classification (12)	ANN (PyTorch)	7	UCI Date Fruit dataset (898 rows, 34 features)	—

Loading and running a trained model

The pattern below applies to any Keras/TensorFlow model saved in the repository’s Models/ directory. For PyTorch projects, see the PyTorch variant beneath it.

# --- Keras / TensorFlow ---
import numpy as np
from tensorflow import keras

# Load the saved model
model = keras.models.load_model("Models/image_classifier.h5")

# Prepare a single image (resize to match training resolution, e.g. 32x32 for CIFAR-10)
from tensorflow.keras.preprocessing import image as keras_image

img = keras_image.load_img("sample.jpg", target_size=(32, 32))
img_array = keras_image.img_to_array(img) / 255.0          # normalise to [0, 1]
img_array = np.expand_dims(img_array, axis=0)               # add batch dimension

# Predict
predictions = model.predict(img_array)
predicted_class = np.argmax(predictions, axis=1)[0]
print(f"Predicted class index: {predicted_class}")

# --- PyTorch (CIFAR-10 / Date Fruit projects) ---
import torch
import torchvision.transforms as transforms
from PIL import Image

# Recreate the model architecture and load saved weights
# (replace MyModel with the class defined in the notebook)
model = MyModel()
model.load_state_dict(torch.load("Models/model.pth", map_location="cpu"))
model.eval()

transform = transforms.Compose([
    transforms.Resize((32, 32)),
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
])

img = Image.open("sample.jpg").convert("RGB")
tensor = transform(img).unsqueeze(0)   # shape: [1, 3, 32, 32]

with torch.no_grad():
    output = model(tensor)
    predicted_class = output.argmax(dim=1).item()

print(f"Predicted class index: {predicted_class}")

30 – Binary Image Classification

What the project does

Trains a CNN to distinguish between exactly two image classes (e.g. cats vs. dogs, or any binary domain). This is the entry-level vision project in the repository and provides a clean, well-commented baseline for understanding CNN construction, binary cross-entropy loss, and sigmoid activation at the output layer.

Algorithm used

A custom CNN with stacked Conv2D → MaxPooling2D → Dropout blocks, a Flatten layer, and a single-neuron sigmoid output. The architecture is deliberately lightweight to keep training time short on CPU.

Dataset / domain

A binary image dataset sourced from Kaggle. Images are organised into two class folders under dataset/train/ and dataset/test/ following the Keras ImageDataGenerator directory convention.

Key techniques

Data augmentation – random horizontal flips, zoom, and rotation via ImageDataGenerator to reduce overfitting on small datasets.
Binary cross-entropy loss with sigmoid output activation.
Callbacks – EarlyStopping and ModelCheckpoint to save the best epoch.
Evaluation – accuracy, precision, recall, and a confusion matrix on the test set.

How to run

pip install tensorflow matplotlib scikit-learn

# Place images in:
#   dataset/train/class_a/
#   dataset/train/class_b/
#   dataset/test/class_a/
#   dataset/test/class_b/

jupyter notebook  # open the notebook inside 30_Binary_Image_Classification/

31 – Food Image Classification

What the project does

Classifies food images into multiple cuisine or dish categories. The project explores both a custom CNN trained from scratch and optional transfer learning from a pre-trained backbone (e.g. MobileNetV2 or VGG16), demonstrating how feature reuse from ImageNet accelerates convergence on small domain-specific datasets.

Algorithm used

Custom CNN (multi-class with softmax output) and optionally a transfer-learning variant using a frozen pre-trained base model with a custom classification head.

Dataset / domain

A food image dataset sourced from Kaggle with multiple dish classes. Images are stored in class-labelled subdirectories under dataset/.

Key techniques

Transfer learning – freeze the convolutional base of a pre-trained model; fine-tune the top layers on food images.
Categorical cross-entropy loss with softmax output.
Class imbalance handling – class_weight argument in model.fit() to up-weight underrepresented food categories.
Top-5 accuracy – additional metric alongside top-1 accuracy for multi-class evaluation.

How to run

pip install tensorflow matplotlib scikit-learn pillow

jupyter notebook  # open the notebook inside 31_Food_Image_Classification/

32 – CIFAR-10 Image Classification

What the project does

Implements a CNN on the canonical CIFAR-10 benchmark — 60 000 colour images (32 × 32 px) spread evenly across 10 object classes: airplane, automobile, bird, cat, deer, dog, frog, horse, ship, and truck. This project is the only one in the vision section that uses PyTorch rather than TensorFlow/Keras.

Algorithm used

A custom CNN built with torch.nn modules: Conv2d → BatchNorm2d → ReLU → MaxPool2d blocks followed by fully connected layers and a 10-class softmax. Optimisation uses torch.optim (SGD or Adam).

Dataset / domain

CIFAR-10 downloaded automatically via torchvision.datasets.CIFAR10. The dataset is ~170 MB and is cached under dataset/ after the first download.

Key techniques

Normalisation – pixel values scaled to [-1, 1] using transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)).
DataLoader – batched loading with shuffle for training, deterministic order for evaluation.
Learning-rate scheduling – StepLR or CosineAnnealingLR to decay the learning rate during training.
GPU support – move tensors to device = torch.device("cuda" if torch.cuda.is_available() else "cpu").

How to run

pip install torch torchvision matplotlib

jupyter notebook image_class_cifar10.ipynb

The notebook downloads CIFAR-10 automatically on the first run.

33 – MNIST Digit Classification

What the project does

Recognises handwritten digits (0–9) from the MNIST dataset — the “Hello World” benchmark of deep learning. Despite its simplicity, this project provides a rigorous demonstration of CNN design, dropout regularisation, and evaluation via a confusion matrix across 10 digit classes.

Algorithm used

A CNN (or optionally a dense ANN as a baseline comparison) with Conv2D → MaxPooling2D → Dropout blocks, trained on grayscale 28 × 28 images. The output layer uses softmax over 10 digit classes.

Dataset / domain

MNIST — 70 000 grayscale images (60 000 train / 10 000 test), loaded directly via keras.datasets.mnist or torchvision.datasets.MNIST. No external download is required.

Key techniques

Grayscale normalisation – pixel values divided by 255 to map to [0, 1].
Dropout regularisation – reduces overfitting on the relatively small MNIST images.
Batch normalisation – accelerates convergence and improves generalisation.
Confusion matrix – per-class breakdown of correct and misclassified digits.

How to run

pip install tensorflow matplotlib scikit-learn

jupyter notebook  # open the notebook inside 33_MNIST_Digit_Classification/

12 – Date Fruit Classification

What the project does

Classifies seven varieties of date fruit (BERHI, DOKOL, SAFAVI, ROTANA, DEGLET, SOGAY, IRAQI) from 34 morphological and colour features extracted from fruit images. Unlike the other vision projects, the classification here operates on pre-extracted tabular features (area, perimeter, colour statistics, wavelet coefficients) rather than raw pixel data, making it a bridge between classical ML and deep learning.

Algorithm used

An Artificial Neural Network (ANN) built in PyTorch: two hidden layers of 64 neurons each with ReLU activation, trained with CrossEntropyLoss and the Adam optimiser. Input dimensionality equals 34 features; output is a 7-class probability distribution.

Dataset / domain

dataset/datefruit_dataset.csv — 898 rows × 35 columns (34 numeric features + Class label). The seven classes are distributed as: DOKOL (204), SAFAVI (199), ROTANA (166), DEGLET (98), SOGAY (94), IRAQI (72), BERHI (65).

Key techniques

Label encoding – sklearn.preprocessing.LabelEncoder maps string class names to integer indices.
Train/test split – 80 / 20 stratified split via train_test_split.
Feature scaling – StandardScaler fitted on the training set and applied to both splits.
TensorDataset + DataLoader – wraps NumPy arrays as PyTorch tensors for mini-batch training (batch size 32).
Training loop – manual epoch loop with loss logging; validation accuracy evaluated after each epoch.

How to run

pip install torch pandas scikit-learn matplotlib seaborn

jupyter notebook Date_Fruit_Class.ipynb

The dataset CSV must be present at dataset/datefruit_dataset.csv relative to the notebook.

For the fastest iteration cycle, start with MNIST (33) or Date Fruit (12) — both train in minutes on CPU. Move to CIFAR-10 (32) once you have GPU access, as it benefits most from hardware acceleration.

Supervised Learning

Unsupervised & Vision

NLP & Generative AI

Time Series & Reinforcement Learning

Computer vision: CNN image classification projects

Project comparison

Loading and running a trained model

What the project does

Algorithm used

Dataset / domain

Key techniques

How to run

What the project does

Algorithm used

Dataset / domain

Key techniques

How to run

What the project does

Algorithm used

Dataset / domain

Key techniques

How to run

What the project does

Algorithm used

Dataset / domain

Key techniques

How to run

What the project does

Algorithm used

Dataset / domain

Key techniques

How to run

Build docs developers (and LLMs) love

Supervised Learning

Unsupervised & Vision

NLP & Generative AI

Time Series & Reinforcement Learning

Documentation Index

​Project comparison

​Loading and running a trained model

​What the project does

​Algorithm used

​Dataset / domain

​Key techniques

​How to run

​What the project does

​Algorithm used

​Dataset / domain

​Key techniques

​How to run

​What the project does

​Algorithm used

​Dataset / domain

​Key techniques

​How to run

​What the project does

​Algorithm used

​Dataset / domain

​Key techniques

​How to run

​What the project does

​Algorithm used

​Dataset / domain

​Key techniques

​How to run

Build docs developers (and LLMs) love

Project comparison

Loading and running a trained model

What the project does

Algorithm used

Dataset / domain

Key techniques

How to run

What the project does

Algorithm used

Dataset / domain

Key techniques

How to run

What the project does

Algorithm used

Dataset / domain

Key techniques

How to run

What the project does

Algorithm used

Dataset / domain

Key techniques

How to run

What the project does

Algorithm used

Dataset / domain

Key techniques

How to run