Skip to main content

Overview

Reciclaje AI uses YOLOv8 (You Only Look Once version 8), a state-of-the-art real-time object detection model from Ultralytics. YOLOv8 combines speed and accuracy, making it ideal for real-time waste classification applications.

Why YOLOv8?

Real-Time Performance

YOLOv8 processes frames in milliseconds, enabling smooth real-time video analysis without lag.

High Accuracy

Advanced architecture provides precise bounding boxes and confident classifications.

Easy Integration

Ultralytics library offers simple Python API for model loading and inference.

Efficient Training

Can be trained on custom datasets with relatively small amounts of labeled data.

Model Integration

The YOLOv8 model is integrated into Reciclaje AI using the Ultralytics library:
from ultralytics import YOLO

# Load pre-trained model
model = YOLO('Modelos/best.pt')

# Run inference on a frame
results = model(frame, stream=True, verbose=False)
The best.pt file contains the trained model weights. This file is generated during the training process and contains learned patterns for detecting the 5 waste categories.

Model Architecture Components

YOLOv8 consists of three main components:
1

Backbone

CSPDarknet extracts features from input images using convolutional layers. It identifies low-level features (edges, textures) and high-level features (object shapes, patterns).
  • Processes 640×640 pixel input images (default)
  • Uses efficient cross-stage partial connections
  • Reduces computational cost while maintaining accuracy
2

Neck

Path Aggregation Network (PANet) combines features from different scales to detect objects of various sizes.
  • Merges features from multiple layers
  • Enables detection of both small and large waste items
  • Improves localization accuracy for bounding boxes
3

Head

Detection Head generates final predictions including:
  • Bounding box coordinates (x, y, width, height)
  • Object class probabilities (5 classes for Reciclaje AI)
  • Confidence scores
The head uses an anchor-free approach for faster, more accurate detections.

Inference Process

When a frame is passed to the model, YOLOv8 follows this process:
# Input: 1280x720 frame from camera
ret, frame = cap.read()

# YOLOv8 performs:
# 1. Image preprocessing (resize, normalize)
# 2. Feature extraction (backbone)
# 3. Multi-scale feature fusion (neck)
# 4. Prediction generation (head)
results = model(frame, stream=True, verbose=False)

# Output: Detection results
for res in results:
    boxes = res.boxes  # Bounding box coordinates
    for box in boxes:
        x1, y1, x2, y2 = box.xyxy[0]  # Box coordinates
        cls = int(box.cls[0])          # Class ID (0-4)
        conf = math.ceil(box.conf[0])  # Confidence score

Key Inference Parameters

Enables memory-efficient inference for video streams. The model yields results as they’re produced instead of storing all detections in memory.
results = model(frame, stream=True, verbose=False)
Essential for real-time applications where frames are processed continuously.
Suppresses detailed logging output during inference. Improves performance by reducing I/O operations.
results = model(frame, stream=True, verbose=False)
Set to True during debugging to see detailed model information.
YOLOv8 typically resizes images to 640×640 for processing. The camera captures at 1280×720, which is automatically handled by the model.
cap.set(3, 1280)  # Width
cap.set(4, 720)   # Height
You can adjust input size during training for different speed/accuracy tradeoffs.

Detection Output Structure

YOLOv8 returns structured detection results:
for res in results:
    boxes = res.boxes  # Boxes object containing all detections
    
    for box in boxes:
        # Bounding box in xyxy format (top-left, bottom-right)
        x1, y1, x2, y2 = box.xyxy[0]
        x1, y1, x2, y2 = int(x1), int(y1), int(x2), int(y2)
        
        # Class prediction (0=Metal, 1=Glass, 2=Plastic, 3=Carton, 4=Medical)
        cls = int(box.cls[0])
        
        # Confidence score (0.0 to 1.0)
        conf = math.ceil(box.conf[0])
The xyxy format represents bounding boxes as [x1, y1, x2, y2] where (x1, y1) is the top-left corner and (x2, y2) is the bottom-right corner.

Model Training Overview

While the provided code uses a pre-trained model (best.pt), here’s how YOLOv8 models are trained:
1

Dataset Preparation

Collect and label images of waste materials:
  • Capture diverse images of Metal, Glass, Plastic, Carton, and Medical waste
  • Annotate bounding boxes and class labels
  • Split into training, validation, and test sets
2

Model Configuration

Configure YOLOv8 for 5 classes:
# data.yaml
train: ../train/images
val: ../valid/images
nc: 5  # Number of classes
names: ['Metal', 'Glass', 'Plastic', 'Carton', 'Medical']
3

Training

Train the model using Ultralytics:
from ultralytics import YOLO

# Load base model
model = YOLO('yolov8n.pt')  # nano, small, medium, large, xlarge

# Train on custom dataset
results = model.train(
    data='data.yaml',
    epochs=100,
    imgsz=640,
    batch=16
)

# Best weights saved as 'best.pt'
4

Evaluation

Test the model on unseen data:
metrics = model.val()  # Validation metrics
results = model('test_image.jpg')  # Test inference

Model Variants

YOLOv8 comes in different sizes for various hardware capabilities:
ModelSizeSpeedAccuracyUse Case
YOLOv8nNanoFastestGoodMobile devices, edge computing
YOLOv8sSmallVery FastBetterEmbedded systems, Raspberry Pi
YOLOv8mMediumFastHighStandard laptops, GPUs
YOLOv8lLargeModerateVery HighHigh-end workstations
YOLOv8xXLargeSlowerHighestMaximum accuracy scenarios
For real-time waste detection in educational settings, YOLOv8n or YOLOv8s provide the best balance of speed and accuracy on typical hardware.

Performance Optimization

GPU Acceleration

YOLOv8 automatically uses CUDA-enabled GPUs when available, significantly improving inference speed.
# Check GPU availability
import torch
print(torch.cuda.is_available())

Half Precision

Use FP16 inference for 2x speed improvement on compatible GPUs:
model = YOLO('best.pt')
results = model(frame, half=True)

Batch Processing

Process multiple frames in batches for better GPU utilization:
results = model([frame1, frame2, frame3])

Model Export

Export to optimized formats like ONNX or TensorRT:
model.export(format='onnx')
model.export(format='tensorrt')

Confidence Scoring

The model outputs a confidence score for each detection:
conf = math.ceil(box.conf[0])
print(f"Clase: {cls} Confidence: {conf}")

if conf > 0:
    # Display detection
    text = f'{clsName[cls]} {int(conf * 100)}%'
The code uses math.ceil() which rounds up the confidence. In production, you may want to preserve decimal precision or set a higher threshold (e.g., conf > 0.5) to filter low-confidence detections.

Model Files Structure

Modelos/
└── best.pt          # Trained model weights
    ├── Model architecture
    ├── Learned weights
    ├── Class names
    └── Training configuration
The .pt file is a PyTorch checkpoint containing:
  • Neural network architecture definition
  • Trained weight parameters
  • Optimization state
  • Model metadata (class names, input size, etc.)

Advantages for Educational Use

Visual Learning

Real-time bounding boxes and labels help students understand how AI “sees” objects.

Immediate Feedback

Fast inference provides instant results, keeping students engaged.

Practical Application

Demonstrates computer vision concepts with a meaningful environmental application.

Extensible Platform

Students can experiment with model parameters, add new classes, or adjust confidence thresholds.

Common Questions

YOLOv8 offers the best balance of speed and accuracy for real-time applications. Alternatives like Faster R-CNN are more accurate but too slow for real-time video. YOLOv8 achieves competitive accuracy while processing 30+ frames per second.
Yes, YOLOv8 can run on CPU, though at reduced speed. The nano (n) and small (s) variants are optimized for CPU inference. Expect 5-15 FPS on modern CPUs compared to 30-60+ FPS on GPUs.
The best.pt file was created by training YOLOv8 on a custom dataset of labeled waste images. The training process involved:
  1. Collecting diverse waste images
  2. Annotating objects with bounding boxes and class labels
  3. Training for multiple epochs
  4. Selecting the checkpoint with the best validation performance
Absolutely! You can fine-tune the existing model or train from scratch with additional data. This is useful for:
  • Improving accuracy on specific waste types
  • Adding new waste categories
  • Adapting to different environmental conditions
  • Specializing for regional recycling requirements

Next Steps

How It Works

See how YOLOv8 integrates into the detection pipeline

Detection Classes

Learn about the 5 waste categories the model detects

Additional Resources

Ultralytics YOLOv8 Documentation

Official documentation for YOLOv8, including training guides, API reference, and advanced configurations.

Build docs developers (and LLMs) love