Model Architecture

Overview

Reciclaje AI uses YOLOv8 (You Only Look Once version 8), a state-of-the-art real-time object detection model from Ultralytics. YOLOv8 combines speed and accuracy, making it ideal for real-time waste classification applications.

Why YOLOv8?

Real-Time Performance

YOLOv8 processes frames in milliseconds, enabling smooth real-time video analysis without lag.

High Accuracy

Advanced architecture provides precise bounding boxes and confident classifications.

Easy Integration

Ultralytics library offers simple Python API for model loading and inference.

Efficient Training

Can be trained on custom datasets with relatively small amounts of labeled data.

Model Integration

The YOLOv8 model is integrated into Reciclaje AI using the Ultralytics library:

from ultralytics import YOLO

# Load pre-trained model
model = YOLO('Modelos/best.pt')

# Run inference on a frame
results = model(frame, stream=True, verbose=False)

The best.pt file contains the trained model weights. This file is generated during the training process and contains learned patterns for detecting the 5 waste categories.

Model Architecture Components

YOLOv8 consists of three main components:

Backbone

CSPDarknet extracts features from input images using convolutional layers. It identifies low-level features (edges, textures) and high-level features (object shapes, patterns).

Processes 640×640 pixel input images (default)
Uses efficient cross-stage partial connections
Reduces computational cost while maintaining accuracy

Neck

Path Aggregation Network (PANet) combines features from different scales to detect objects of various sizes.

Merges features from multiple layers
Enables detection of both small and large waste items
Improves localization accuracy for bounding boxes

Head

Detection Head generates final predictions including:

Bounding box coordinates (x, y, width, height)
Object class probabilities (5 classes for Reciclaje AI)
Confidence scores

The head uses an anchor-free approach for faster, more accurate detections.

Inference Process

When a frame is passed to the model, YOLOv8 follows this process:

# Input: 1280x720 frame from camera
ret, frame = cap.read()

# YOLOv8 performs:
# 1. Image preprocessing (resize, normalize)
# 2. Feature extraction (backbone)
# 3. Multi-scale feature fusion (neck)
# 4. Prediction generation (head)
results = model(frame, stream=True, verbose=False)

# Output: Detection results
for res in results:
    boxes = res.boxes  # Bounding box coordinates
    for box in boxes:
        x1, y1, x2, y2 = box.xyxy[0]  # Box coordinates
        cls = int(box.cls[0])          # Class ID (0-4)
        conf = math.ceil(box.conf[0])  # Confidence score

Key Inference Parameters

stream=True

Enables memory-efficient inference for video streams. The model yields results as they’re produced instead of storing all detections in memory.

results = model(frame, stream=True, verbose=False)

Essential for real-time applications where frames are processed continuously.

verbose=False

Suppresses detailed logging output during inference. Improves performance by reducing I/O operations.

results = model(frame, stream=True, verbose=False)

Set to True during debugging to see detailed model information.

Image Size

YOLOv8 typically resizes images to 640×640 for processing. The camera captures at 1280×720, which is automatically handled by the model.

cap.set(3, 1280)  # Width
cap.set(4, 720)   # Height

You can adjust input size during training for different speed/accuracy tradeoffs.

Detection Output Structure

YOLOv8 returns structured detection results:

for res in results:
    boxes = res.boxes  # Boxes object containing all detections
    
    for box in boxes:
        # Bounding box in xyxy format (top-left, bottom-right)
        x1, y1, x2, y2 = box.xyxy[0]
        x1, y1, x2, y2 = int(x1), int(y1), int(x2), int(y2)
        
        # Class prediction (0=Metal, 1=Glass, 2=Plastic, 3=Carton, 4=Medical)
        cls = int(box.cls[0])
        
        # Confidence score (0.0 to 1.0)
        conf = math.ceil(box.conf[0])

The xyxy format represents bounding boxes as [x1, y1, x2, y2] where (x1, y1) is the top-left corner and (x2, y2) is the bottom-right corner.

Model Training Overview

While the provided code uses a pre-trained model (best.pt), here’s how YOLOv8 models are trained:

Dataset Preparation

Collect and label images of waste materials:

Capture diverse images of Metal, Glass, Plastic, Carton, and Medical waste
Annotate bounding boxes and class labels
Split into training, validation, and test sets

Model Configuration

Configure YOLOv8 for 5 classes:

# data.yaml
train: ../train/images
val: ../valid/images
nc: 5  # Number of classes
names: ['Metal', 'Glass', 'Plastic', 'Carton', 'Medical']

Training

Train the model using Ultralytics:

from ultralytics import YOLO

# Load base model
model = YOLO('yolov8n.pt')  # nano, small, medium, large, xlarge

# Train on custom dataset
results = model.train(
    data='data.yaml',
    epochs=100,
    imgsz=640,
    batch=16
)

# Best weights saved as 'best.pt'

Evaluation

Test the model on unseen data:

metrics = model.val()  # Validation metrics
results = model('test_image.jpg')  # Test inference

Model Variants

YOLOv8 comes in different sizes for various hardware capabilities:

Model	Size	Speed	Accuracy	Use Case
YOLOv8n	Nano	Fastest	Good	Mobile devices, edge computing
YOLOv8s	Small	Very Fast	Better	Embedded systems, Raspberry Pi
YOLOv8m	Medium	Fast	High	Standard laptops, GPUs
YOLOv8l	Large	Moderate	Very High	High-end workstations
YOLOv8x	XLarge	Slower	Highest	Maximum accuracy scenarios

For real-time waste detection in educational settings, YOLOv8n or YOLOv8s provide the best balance of speed and accuracy on typical hardware.

Performance Optimization

GPU Acceleration

YOLOv8 automatically uses CUDA-enabled GPUs when available, significantly improving inference speed.

# Check GPU availability
import torch
print(torch.cuda.is_available())

Half Precision

Use FP16 inference for 2x speed improvement on compatible GPUs:

model = YOLO('best.pt')
results = model(frame, half=True)

Batch Processing

Process multiple frames in batches for better GPU utilization:

results = model([frame1, frame2, frame3])

Model Export

Export to optimized formats like ONNX or TensorRT:

model.export(format='onnx')
model.export(format='tensorrt')

Confidence Scoring

The model outputs a confidence score for each detection:

conf = math.ceil(box.conf[0])
print(f"Clase: {cls} Confidence: {conf}")

if conf > 0:
    # Display detection
    text = f'{clsName[cls]} {int(conf * 100)}%'

The code uses math.ceil() which rounds up the confidence. In production, you may want to preserve decimal precision or set a higher threshold (e.g., conf > 0.5) to filter low-confidence detections.

Model Files Structure

Modelos/
└── best.pt          # Trained model weights
    ├── Model architecture
    ├── Learned weights
    ├── Class names
    └── Training configuration

The .pt file is a PyTorch checkpoint containing:

Neural network architecture definition
Trained weight parameters
Optimization state
Model metadata (class names, input size, etc.)

Advantages for Educational Use

Visual Learning

Real-time bounding boxes and labels help students understand how AI “sees” objects.

Immediate Feedback

Fast inference provides instant results, keeping students engaged.

Practical Application

Demonstrates computer vision concepts with a meaningful environmental application.

Extensible Platform

Students can experiment with model parameters, add new classes, or adjust confidence thresholds.

Common Questions

Why not other object detection models?

YOLOv8 offers the best balance of speed and accuracy for real-time applications. Alternatives like Faster R-CNN are more accurate but too slow for real-time video. YOLOv8 achieves competitive accuracy while processing 30+ frames per second.

Can the model run without a GPU?

Yes, YOLOv8 can run on CPU, though at reduced speed. The nano (n) and small (s) variants are optimized for CPU inference. Expect 5-15 FPS on modern CPUs compared to 30-60+ FPS on GPUs.

How was best.pt created?

The best.pt file was created by training YOLOv8 on a custom dataset of labeled waste images. The training process involved:

Collecting diverse waste images
Annotating objects with bounding boxes and class labels
Training for multiple epochs
Selecting the checkpoint with the best validation performance

Can I retrain the model?

Absolutely! You can fine-tune the existing model or train from scratch with additional data. This is useful for:

Improving accuracy on specific waste types
Adding new waste categories
Adapting to different environmental conditions
Specializing for regional recycling requirements

Next Steps

How It Works

See how YOLOv8 integrates into the detection pipeline

Detection Classes

Learn about the 5 waste categories the model detects

Additional Resources

Ultralytics YOLOv8 Documentation

Official documentation for YOLOv8, including training guides, API reference, and advanced configurations.

Get Started

Concepts

Installation

Usage

Model

Resources

Overview

Why YOLOv8?

Real-Time Performance

High Accuracy

Easy Integration

Efficient Training

Model Integration

Model Architecture Components

Inference Process

Key Inference Parameters

Detection Output Structure

Model Training Overview

Model Variants

Performance Optimization

GPU Acceleration

Half Precision

Batch Processing

Model Export

Confidence Scoring

Model Files Structure

Advantages for Educational Use

Visual Learning

Immediate Feedback

Practical Application

Extensible Platform

Common Questions

Next Steps

How It Works

Detection Classes

Additional Resources

Ultralytics YOLOv8 Documentation

Build docs developers (and LLMs) love

Get Started

Concepts

Installation

Usage

Model

Resources

Documentation Index

​Overview

​Why YOLOv8?

Real-Time Performance

High Accuracy

Easy Integration

Efficient Training

​Model Integration

​Model Architecture Components

​Inference Process

​Key Inference Parameters

​Detection Output Structure

​Model Training Overview

​Model Variants

​Performance Optimization

GPU Acceleration

Half Precision

Batch Processing

Model Export

​Confidence Scoring

​Model Files Structure

​Advantages for Educational Use

Visual Learning

Immediate Feedback

Practical Application

Extensible Platform

​Common Questions

​Next Steps

How It Works

Detection Classes

​Additional Resources

Ultralytics YOLOv8 Documentation

Build docs developers (and LLMs) love

Overview

Why YOLOv8?

Model Integration

Model Architecture Components

Inference Process

Key Inference Parameters

Detection Output Structure

Model Training Overview

Model Variants

Performance Optimization

Confidence Scoring

Model Files Structure

Advantages for Educational Use

Common Questions

Next Steps

Additional Resources