Skip to main content

Overview

Face detection is the first critical step in the recognition pipeline. Before we can identify who someone is, we need to locate where their face appears in the image. Iris uses YuNet, a lightweight CNN-based face detector developed by OpenCV, running on ONNX Runtime for maximum performance.
Model: face_detection_yunet_2023mar.onnx (March 2023 version)

YuNet Architecture

YuNet (You Only Need One Network) is specifically designed for:
  • Speed: Real-time detection on CPU
  • Accuracy: 96%+ detection rate on standard benchmarks
  • Small footprint: Works efficiently on edge devices
  • Multi-scale detection: Finds faces of varying sizes

Technical Specifications

Input Size

320x320 pixels (dynamically resized per image)

Detection Threshold

0.9 confidence score

NMS Threshold

0.3 (Non-Maximum Suppression)

Max Faces

5000 per image

Initialization

The detector is loaded when the FaceEngine initializes (face.rs:10-13):
face.rs
let detector = objdetect::FaceDetectorYN::create(
    "face_detection_yunet_2023mar.onnx", 
    "", 
    core::Size::new(320, 320), 
    0.9,   // score_threshold
    0.3,   // nms_threshold
    5000,  // top_k
    0,     // backend_id (default)
    0      // target_id (default)
)?;

Parameter Breakdown

Only faces detected with 90%+ confidence are considered valid. This high threshold reduces false positives but may miss faces in challenging conditions (heavy occlusion, extreme angles).
Non-Maximum Suppression removes overlapping bounding boxes. A threshold of 0.3 means boxes with >30% overlap are merged, keeping only the highest-confidence detection.
Maximum number of candidate detections before NMS. This handles group photos with many faces.

Detection Process

The get_embedding function in face.rs:21-39 handles face detection:

Step 1: Dynamic Input Sizing

face.rs
det.set_input_size(img.size())?;
Unlike fixed-size CNNs, YuNet adapts to the input image dimensions. This preserves aspect ratio and avoids distortion from aggressive resizing.
Larger images take longer to process. For optimal performance, resize images to ~640px on the longest side before sending to the API.

Step 2: Run Detection

face.rs
let mut faces = Mat::default();
det.detect(img, &mut faces)?;
The detect() method returns a Mat where:
  • Each row represents one detected face
  • Columns contain: [x, y, width, height, x_re, y_re, x_le, y_le, x_n, y_n, x_rm, y_rm, x_lm, y_lm, confidence]

Output Format

// faces.row(0) contains:
// [0-3]   Bounding box: x, y, width, height
// [4-5]   Right eye center: x_re, y_re
// [6-7]   Left eye center: x_le, y_le
// [8-9]   Nose tip: x_n, y_n
// [10-11] Right mouth corner: x_rm, y_rm
// [12-13] Left mouth corner: x_lm, y_lm
// [14]    Confidence score

Face Selection Strategy

Iris uses a single-face strategy (face.rs:30-37):
face.rs
if faces.rows() > 0 {
    let face_data = faces.row(0)?;
    let mut aligned = Mat::default();
    rec.align_crop(img, &face_data, &mut aligned)?;
    let mut feature = Mat::default();
    rec.feature(&aligned, &mut feature)?;
    return Ok(Some(feature.clone()));
}
Ok(None)
When multiple faces are detected, only the first face (typically the largest/most confident) is used for recognition.

Why Only One Face?

This design choice simplifies the API: Predictable behavior: One input image = one embedding
Performance: No need to process every face in group photos
Use case alignment: Most applications verify a single person’s identity
For multi-face scenarios, clients should crop individual faces before sending to the API.

Face Alignment

Before recognition, the detected face undergoes geometric normalization:
face.rs
rec.align_crop(img, &face_data, &mut aligned)?;
This process:
  1. Rotates the face to be upright based on eye positions
  2. Scales to a standard size (112x112 pixels for SFace)
  3. Crops to remove background and focus on facial features

Why Alignment Matters

Rotation Invariance

Handles tilted heads and angled photos

Consistent Scale

Normalizes faces from various distances

Feature Focus

Removes distracting background elements

Better Embeddings

Leads to more accurate recognition results

Edge Cases

No Face Detected

When faces.rows() == 0, the function returns Ok(None) instead of an error. The caller in main.rs:82-89 handles this gracefully:
main.rs
if let Ok(Some(emb)) = get_embedding(&target_img, det, rec) {
    target_embedding = Some(emb);
}

let Some(t_emb) = target_embedding else {
    return Json(CompareResponse { matches: vec![] });
};
No face in target = no matches returned.

Partial Occlusion

YuNet handles partial face occlusion (sunglasses, masks, hands) reasonably well, but:
If more than 50% of the face is occluded, detection may fail or produce low-quality landmarks for alignment.

Profile Views

The model is trained primarily on frontal faces (±45° yaw). Extreme profile views (side faces) may:
  • Not be detected
  • Produce poor alignment
  • Result in low-quality embeddings

Performance Characteristics

Speed

On modern CPUs:
  • Small images (320x240): ~10-20ms
  • Medium images (640x480): ~30-50ms
  • Large images (1920x1080): ~100-150ms
Detection time scales with image resolution. Downscale high-resolution images before processing for faster response times.

Memory Usage

Per detection:
  • Model weights: ~1.2 MB (loaded once at startup)
  • Input image: Depends on resolution (e.g., 640x480x3 = ~900 KB)
  • Output faces Mat: Minimal (15 floats × number of faces)

Debugging Detection Issues

If faces aren’t being detected:
1

Check Image Quality

Ensure minimum resolution of 160x160 pixels and clear facial features
2

Verify Face Visibility

At least eyes and nose should be visible and unoccluded
3

Test Angles

Keep face within ±45° rotation on all axes
4

Lighting Conditions

Avoid extreme shadows or overexposure that obscure features

Model Comparison

ModelSpeedAccuracySizeBest For
YuNet⚡⚡⚡ Fast96%1.2 MBProduction APIs, edge devices
MTCNN⚡⚡ Medium97%2.3 MBHigh accuracy needs
RetinaFace⚡ Slow98%27 MBResearch, offline processing
YuNet strikes the best balance for real-time API use cases.

Build docs developers (and LLMs) love