Detection vs. classification
| Task | Output | Example |
|---|---|---|
| Classification | Single class label | ”cat” |
| Detection | Bounding boxes + class labels | [(x,y,w,h, "cat"), (x,y,w,h, "dog")] |
| Segmentation | Pixel-wise mask | Per-pixel class assignment |
YOLO architecture overview
YOLO divides the input image into an grid. Each grid cell predicts bounding boxes and class probabilities simultaneously.Bounding box prediction
Each bounding box prediction consists of 5 values:- : center of the box relative to the grid cell.
- : width and height relative to the full image.
- Confidence:
Anchor boxes
Modern YOLO versions use anchor boxes — predefined aspect ratios learned from the training data via k-means clustering. Each anchor handles objects of a specific size and shape.Multi-scale detection
YOLOv5 and later versions detect objects at three scales (large, medium, small feature map strides), allowing the network to handle objects of very different sizes in the same image.Running YOLO inference
Object tracking
YOLO can be combined with tracking algorithms (DeepSORT, ByteTrack) to assign persistent IDs across video frames:Evaluation metrics
Intersection over Union (IoU)
A prediction is considered correct when (PASCAL VOC) or averaged over (COCO).Precision and Recall
Mean Average Precision (mAP)
mAP is the primary benchmark for detection models. It averages the area under the precision-recall curve (AP) across all object categories:Anomaly detection
Anomaly detection in computer vision identifies out-of-distribution samples — defective products, unusual events, or unseen object types. Common approaches:- Reconstruction-based: autoencoders trained on normal data; high reconstruction error signals anomalies.
- Feature distribution: fit a Gaussian on embeddings from normal samples; Mahalanobis distance detects outliers.
- One-class classification: models like PatchCore or PaDiM trained exclusively on normal images.
When to use anomaly detection vs. supervised detection
When to use anomaly detection vs. supervised detection
Use anomaly detection when:
- Defect types are unknown or highly variable.
- You only have access to “normal” samples during training.
- The defect rate is extremely low (few labeled examples).
- Defect categories are well-defined and labeled data is available.
- You need bounding box localization with class labels.
CLIP for zero-shot detection
CLIP (Contrastive Language-Image Pretraining) learns a joint embedding space for images and text. This enables zero-shot detection: describe an object in natural language and find it without any labeled images.Resources
YOLO Example Notebook
Complete YOLO object detection example from the course.
Anomaly Detection Examples
Colab notebook with anomaly detection techniques applied to visual inspection.
CLIP Notebook
CLIP zero-shot classification and retrieval examples.
Video: YOLO Lecture (2021)
Recorded lecture on YOLO object detection and tracking.
Exercise E06 covers YOLO-based mask detection. The full exercise and solution notebooks are distributed via the course Canvas page. The course repository contains related data files and project templates.
