UNet and Bonnet CNN Architectures for Weed Segmentation

AGRIBOT implements two encoder-decoder segmentation architectures in model.py using Keras — a compact UNet for baseline comparison, and the Bonnet architecture from PRBonn lab which was selected as the final model due to its efficiency and suitability for real-time embedded deployment. Both models produce per-pixel class probabilities across three categories: weed, crop, and soil.

The Bonnet architecture is based on the paper by PRBonn lab (arXiv:1709.06764). The Keras implementation in model.py is adapted for the CWFID and BoniRob datasets.

UNet Architecture (`small_Unet`)

The UNet baseline is a four-level encoder-decoder with symmetric skip connections. It operates on standard 3-channel RGB input at 128 × 128 resolution, making it straightforward to train on the CWFID dataset.

def small_Unet(labels, h, w, out_activation):
    """
    Args:
      labels (int): Number of output classes (e.g. 3 for weed/crop/soil)
      h (int): Input image height in pixels
      w (int): Input image width in pixels
      out_activation (str): 'sigmoid', 'softmax', or None

    Returns:
      keras.Model: UNet segmentation model
    """

Encoder

Each encoder block applies three consecutive Conv2D layers with relu activation and same padding, followed by MaxPool2D(2×2) and Dropout(0.5). The number of filters doubles with each block:

Block	Spatial size (128×128 input)	Filters
Block 1	128 × 128	16
Block 2	64 × 64	32
Block 3	32 × 32	64
Block 4	16 × 16	128
Bottleneck	8 × 8	256

Bottleneck

Two stacked Conv2D(256, 3×3, relu) layers form the bottleneck at the lowest spatial resolution (8 × 8 for a 128 × 128 input).

Decoder

Each decoder block mirrors the encoder: UpSampling2D(2×2) → Conv2D to halve filters → concatenate with the corresponding encoder skip feature map → two Conv2D layers → Dropout(0.5). Filter sizes decrease symmetrically: 128 → 64 → 32 → 16.

Output head

A final Conv2D(labels, 1×1) collapses to the number of classes, followed by the configured activation:

'sigmoid' — independent per-class binary probabilities
'softmax' — mutually exclusive class probabilities (recommended for multi-class segmentation)
None — raw logits (used by load_unet() which appends its own Reshape and Softmax)

Input shape: (h, w, 3) — RGB

Bonnet Architecture

Bonnet replaces UNet’s heavy 3×3 convolutions with lightweight depthwise-factorised residual blocks and replaces transposed convolutions in the decoder with mask-guided max-unpooling. The result is approximately 100× fewer parameters than the UNet baseline, enabling real-time inference on the 940 MX and Jetson Nano.

def bonnet(labels, h, w):
    """
    Args:
      labels (int): Number of output classes (3 for weed/crop/soil)
      h (int): Input height (512 for BoniRob)
      w (int): Width (384 for BoniRob)

    Returns:
      keras.Model: Bonnet segmentation model
    """

Input shape: (h, w, 10) — 10-channel multi-spectral input (see Datasets for channel definitions).

Initial convolution (`conv_bonnet`)

A single Conv2D(16, 5×5, relu, same) followed by BatchNormalization produces the initial 16-channel feature map at full resolution.

Residual block (`residual_bonnet`)

Each residual block uses a factorised bottleneck to approximate a 5×5 convolution at low cost:

Conv2D(8, 1×1, relu) — channel reduction
Conv2D(8, 5×1, relu) — horizontal depthwise convolution
Conv2D(8, 1×5, relu) — vertical depthwise convolution
Conv2D(16, 1×1, relu) — channel expansion back to 16
Add([input, output]) — skip connection

Encoder stages

Four encoder stages each apply three consecutive residual blocks followed by MaxPooling2D(2×2, strides=2). Before each pooling step, a spatial mask is computed by comparing the pre-pool feature map to the upsampled post-pool feature map; these masks are stored and reused in the decoder.

Stage	Residual blocks	Pooling
1	3	MaxPool 2×
2	3	MaxPool 2×
3	3	MaxPool 2×
4	3	MaxPool 2×

Max-unpooling decoder

Each decoder stage reverses one pooling step:

UpSampling2D(2×2) — nearest-neighbour upsampling
Element-wise multiply by the stored encoder mask (unpooling_bonnet) — reactivates only the spatial positions that were active in the original pooling step
Three residual blocks — feature refinement

Four decoder stages mirror the four encoder stages, restoring the feature map to the original spatial resolution.

Output head

A Conv2D(labels, 1×1, relu) collapses channels to the number of classes, followed by Reshape(h*w, 3) and a final Softmax activation. The output tensor has shape (batch, h*w, 3), which matches the flattened mask format expected by the weighted categorical cross-entropy loss.

Loading Models

Both architectures are wrapped by convenience functions in utils.py that build, print a summary, and return the Keras model:

from utils import load_unet, load_bonnet

# ── UNet ────────────────────────────────────────────────────────────────────
# 128×128 RGB input, 3 classes
# load_unet calls small_Unet with out_activation=None, then appends Reshape+Softmax
seg_model = load_unet(3, 128, 128)

# ── Bonnet ───────────────────────────────────────────────────────────────────
# 512×384 10-channel input, 3 classes
seg_model = load_bonnet(3, 512, 384)

# Load pre-trained weights
seg_model.load_weights('path/to/weights.h5')

The load_unet() wrapper additionally appends a Reshape and Softmax layer to the raw small_Unet output so that UNet and Bonnet share the same (batch, h*w, labels) output format and can be compiled with the same loss function:

def load_unet(labels, h, w):
    model = small_Unet(labels, h, w, out_activation=None)
    l = layers.Reshape((h*w, labels))(model.output)
    output = layers.Activation("softmax")(l)
    unet_model = Model(inputs=model.input, outputs=output)
    return unet_model

Overview

Getting Started

Autonomous Navigation

Crop-Weed Classification

ROS Packages

UNet Architecture (`small_Unet`)

Encoder

Bottleneck

Decoder

Output head

Bonnet Architecture

Initial convolution (`conv_bonnet`)

Residual block (`residual_bonnet`)

Encoder stages

Max-unpooling decoder

Output head

Loading Models

Build docs developers (and LLMs) love

Overview

Getting Started

Autonomous Navigation

Crop-Weed Classification

ROS Packages

Documentation Index

​UNet Architecture (small_Unet)

​Encoder

​Bottleneck

​Decoder

​Output head

​Bonnet Architecture

​Initial convolution (conv_bonnet)

​Residual block (residual_bonnet)

​Encoder stages

​Max-unpooling decoder

​Output head

​Loading Models

Build docs developers (and LLMs) love

UNet Architecture (`small_Unet`)

Encoder

Bottleneck

Decoder

Output head

Bonnet Architecture

Initial convolution (`conv_bonnet`)

Residual block (`residual_bonnet`)

Encoder stages

Max-unpooling decoder

Output head

Loading Models