Skip to main content

Overview

The Trash Classification AI System uses YOLO (You Only Look Once) for real-time object detection and instance segmentation. The model is specifically trained to detect and segment three categories of waste materials.

Model Architecture

YOLO for Instance Segmentation

YOLO is a state-of-the-art deep learning model that performs object detection and segmentation in a single forward pass:

Single-Stage Detector

Unlike two-stage detectors, YOLO processes the entire image in one pass, enabling real-time performance.

Instance Segmentation

Beyond bounding boxes, the model generates pixel-precise masks for each detected object.

Multi-Class Detection

Simultaneously detects and classifies multiple objects across three waste categories.

Object Tracking

Built-in tracking maintains consistent object IDs across video frames.

Model Loading

The model is loaded using the Ultralytics YOLO library:
# From trash_classificator/segmentation/model_loader.py
from ultralytics import YOLO

class ModelLoader:
    def __init__(self, device: torch.device):
        self.model = YOLO(trash_model_path).to(device)

    def get_model(self) -> YOLO:
        return self.model
The model file trash_segmentation_model_v2.pt is a trained PyTorch model located in the trash_classificator/segmentation/models/ directory.

Input Specifications

Image Format

def inference(self, image: np.ndarray) -> tuple[list[Results], Dict[int, str], device]:
    results = self.trash_segmentation_model.track(
        image,              # Input: NumPy array (H, W, C)
        conf=0.55,
        verbose=False,
        persist=True,
        imgsz=640,          # Resize to 640x640
        stream=True
    )
ParameterValueDescription
Input Typenp.ndarrayNumPy array representing the image
Shape(H, W, 3)Height × Width × Channels (RGB)
Color FormatRGB/BGRCompatible with OpenCV and standard formats
Processing Size640×640Image is resized to 640×640 for inference
The model automatically handles image resizing and preprocessing. Input images of any size are resized to 640×640 while maintaining aspect ratio with padding.

Output Specifications

Results Object

The model returns a YOLO Results object containing:
# From trash_classificator/drawing/main.py
masks = trash_track.masks.xy        # Segmentation masks (polygons)
boxes = trash_track.boxes.xyxy      # Bounding boxes [x1, y1, x2, y2]
tracks_ids = trash_track.boxes.id   # Tracking IDs (persistent)
clss = trash_track.boxes.cls        # Class IDs [0, 1, 2]

Output Components

Format: List of NumPy arrays
Content: Polygon coordinates defining object boundaries
Shape: Variable - depends on object complexity
Usage: Used by MaskDrawer to create colored fill regions
# Example mask format
mask = np.array([[x1, y1], [x2, y2], ..., [xn, yn]])
Format: Tensor (N × 4)
Content: Box coordinates in [x1, y1, x2, y2] format
Coordinates: Top-left (x1, y1) to bottom-right (x2, y2)
Usage: Used by BoundingBoxDrawer and TrackDrawer
# Example: 2 detected objects
boxes = torch.tensor([
    [100, 150, 300, 400],  # Object 1
    [350, 200, 500, 450]   # Object 2
])
Format: Integer tensor
Content: Unique ID for each tracked object
Persistence: IDs remain consistent across frames
Usage: Used by TrackDrawer to maintain movement history
# Example: tracking 3 objects
track_ids = [1, 2, 5]  # Object IDs (not sequential)
Format: Integer tensor
Content: Class ID for each detection [0, 1, 2]
Mapping: 0=cardboard/paper, 1=metal, 2=plastic
Usage: Used to determine color and label for each object
# Example: 3 objects of different types
classes = [0, 2, 1]  # Paper, Plastic, Metal

Confidence Thresholds

Detection Confidence

results = self.trash_segmentation_model.track(
    image, 
    conf=0.55,  # Minimum confidence threshold
    ...
)
Default Threshold: 0.55 (55%)Only detections with confidence scores ≥ 0.55 are returned. This threshold is tuned to balance detection sensitivity with false positive reduction.

Adjusting Confidence

You can modify the threshold based on your use case:
Use CaseRecommended ThresholdTrade-off
High Precision0.70 - 0.80Fewer false positives, may miss some objects
Balanced0.50 - 0.60Good balance (current setting: 0.55)
High Recall0.30 - 0.45Detect more objects, more false positives
Lowering the confidence threshold below 0.40 may result in many false detections, especially in cluttered scenes.

Tracking Parameters

Persistent Tracking

results = self.trash_segmentation_model.track(
    image,
    conf=0.55,
    persist=True,    # Enable persistent tracking
    stream=True,     # Stream results for efficiency
    ...
)

Tracking Features

1

Object ID Assignment

Each detected object receives a unique tracking ID on first detection
2

Cross-Frame Persistence

The same object maintains its ID across video frames
3

Movement Tracking

Object centroids are recorded to visualize movement trails (up to 50 points)
4

Re-identification

If an object temporarily disappears and reappears, the tracker attempts to maintain the same ID

Track History

The system maintains a movement history for each tracked object:
# From trash_classificator/drawing/main.py
class TrackDrawer:
    def __init__(self):
        self.track_history = defaultdict(list)
        self.thickness = 2

    def draw(self, image, tracks_ids, boxes):
        for track_id, box in zip(tracks_ids, boxes):
            track_line = self.track_history[track_id]
            centroid = (float((box[0] + box[2]) / 2), 
                       float((box[1] + box[3]) / 2))
            track_line.append(centroid)

            if len(track_line) > 50:  # Keep last 50 positions
                track_line.pop(0)
Track History Limit: 50 pointsEach object’s movement trail displays the last 50 centroid positions. This provides a good balance between showing recent movement and avoiding visual clutter.

Model Parameters Summary

Inference Configuration

ParameterValuePurpose
conf0.55Minimum detection confidence
imgsz640Input image size (640×640)
persistTrueEnable cross-frame tracking
streamTrueStream results for memory efficiency
verboseFalseDisable logging output

Hardware Acceleration

# From trash_classificator/segmentation/device_manager.py
class DeviceManager:
    @staticmethod
    def get_device() -> torch.device:
        if torch.backends.mps.is_available():
            return torch.device("mps")      # Apple Silicon GPU
        elif torch.cuda.is_available():
            return torch.device("cuda")     # NVIDIA GPU
        else:
            return torch.device("cpu")      # CPU fallback

CUDA

NVIDIA GPU
Best performance for real-time processing

MPS

Apple Silicon
Optimized for M1/M2/M3 Macs

CPU

CPU Fallback
Works on any system, slower inference

Performance Characteristics

Inference Speed

Inference speed depends on hardware:
  • NVIDIA GPU (CUDA): ~30-60 FPS (real-time)
  • Apple Silicon (MPS): ~20-40 FPS
  • CPU: ~5-15 FPS (below real-time)

Memory Usage

ComponentTypical Memory Usage
Model weights~50-100 MB
Input frame (640×640)~1.2 MB
Results per frame~1-5 MB (depends on detections)
Track history~100-500 KB (50 points × objects)

Streaming Mode

results = self.trash_segmentation_model.track(
    image,
    stream=True  # Process results as generator
)

# Results are returned as a generator, not a list
for trash_track in results:
    # Process each frame's results
    ...
Stream Mode Benefits:
  • Reduced memory footprint for batch processing
  • Results are generated on-demand
  • Better performance when processing video streams

Model Training

While this documentation focuses on inference, the model was trained using:

Training Script

training/model_train.py contains the training pipeline

Model Version

Current version: trash_segmentation_model_v2.pt
To retrain the model with additional data or different classes, refer to the training module documentation.

Error Handling

No Detections

# From trash_classificator/processor.py
for trash in trash_track:
    if trash.boxes.id is None:
        return image, 'No trash detected'
When no objects meet the confidence threshold, the model returns results with boxes.id = None.

Device Errors

If GPU acceleration fails, the system automatically falls back to CPU:
try:
    device = torch.device("cuda")
except:
    device = torch.device("cpu")  # Graceful degradation

Integration Reference

For complete integration examples, see:
The YOLO model is seamlessly integrated into the pipeline and requires no manual configuration for standard use cases.

Build docs developers (and LLMs) love