Face Detection

Overview

Face detection is the first critical step in the recognition pipeline. Before we can identify who someone is, we need to locate where their face appears in the image. Iris uses YuNet, a lightweight CNN-based face detector developed by OpenCV, running on ONNX Runtime for maximum performance.

Model: face_detection_yunet_2023mar.onnx (March 2023 version)

YuNet Architecture

YuNet (You Only Need One Network) is specifically designed for:

Speed: Real-time detection on CPU
Accuracy: 96%+ detection rate on standard benchmarks
Small footprint: Works efficiently on edge devices
Multi-scale detection: Finds faces of varying sizes

Technical Specifications

Input Size

320x320 pixels (dynamically resized per image)

Detection Threshold

0.9 confidence score

NMS Threshold

0.3 (Non-Maximum Suppression)

Max Faces

5000 per image

Initialization

The detector is loaded when the FaceEngine initializes (face.rs:10-13):

face.rs

let detector = objdetect::FaceDetectorYN::create(
    "face_detection_yunet_2023mar.onnx", 
    "", 
    core::Size::new(320, 320), 
    0.9,   // score_threshold
    0.3,   // nms_threshold
    5000,  // top_k
    0,     // backend_id (default)
    0      // target_id (default)
)?;

Parameter Breakdown

score_threshold: 0.9

Only faces detected with 90%+ confidence are considered valid. This high threshold reduces false positives but may miss faces in challenging conditions (heavy occlusion, extreme angles).

nms_threshold: 0.3

Non-Maximum Suppression removes overlapping bounding boxes. A threshold of 0.3 means boxes with >30% overlap are merged, keeping only the highest-confidence detection.

top_k: 5000

Maximum number of candidate detections before NMS. This handles group photos with many faces.

Detection Process

The get_embedding function in face.rs:21-39 handles face detection:

Step 1: Dynamic Input Sizing

face.rs

det.set_input_size(img.size())?;

Unlike fixed-size CNNs, YuNet adapts to the input image dimensions. This preserves aspect ratio and avoids distortion from aggressive resizing.

Larger images take longer to process. For optimal performance, resize images to ~640px on the longest side before sending to the API.

Step 2: Run Detection

face.rs

let mut faces = Mat::default();
det.detect(img, &mut faces)?;

The detect() method returns a Mat where:

Each row represents one detected face
Columns contain: [x, y, width, height, x_re, y_re, x_le, y_le, x_n, y_n, x_rm, y_rm, x_lm, y_lm, confidence]

Output Format

// faces.row(0) contains:
// [0-3]   Bounding box: x, y, width, height
// [4-5]   Right eye center: x_re, y_re
// [6-7]   Left eye center: x_le, y_le
// [8-9]   Nose tip: x_n, y_n
// [10-11] Right mouth corner: x_rm, y_rm
// [12-13] Left mouth corner: x_lm, y_lm
// [14]    Confidence score

Face Selection Strategy

Iris uses a single-face strategy (face.rs:30-37):

face.rs

if faces.rows() > 0 {
    let face_data = faces.row(0)?;
    let mut aligned = Mat::default();
    rec.align_crop(img, &face_data, &mut aligned)?;
    let mut feature = Mat::default();
    rec.feature(&aligned, &mut feature)?;
    return Ok(Some(feature.clone()));
}
Ok(None)

When multiple faces are detected, only the first face (typically the largest/most confident) is used for recognition.

Why Only One Face?

This design choice simplifies the API: ✅ Predictable behavior: One input image = one embedding
✅ Performance: No need to process every face in group photos
✅ Use case alignment: Most applications verify a single person’s identity For multi-face scenarios, clients should crop individual faces before sending to the API.

Face Alignment

Before recognition, the detected face undergoes geometric normalization:

face.rs

rec.align_crop(img, &face_data, &mut aligned)?;

This process:

Rotates the face to be upright based on eye positions
Scales to a standard size (112x112 pixels for SFace)
Crops to remove background and focus on facial features

Why Alignment Matters

Rotation Invariance

Handles tilted heads and angled photos

Consistent Scale

Normalizes faces from various distances

Feature Focus

Removes distracting background elements

Better Embeddings

Leads to more accurate recognition results

Edge Cases

No Face Detected

When faces.rows() == 0, the function returns Ok(None) instead of an error. The caller in main.rs:82-89 handles this gracefully:

main.rs

if let Ok(Some(emb)) = get_embedding(&target_img, det, rec) {
    target_embedding = Some(emb);
}

let Some(t_emb) = target_embedding else {
    return Json(CompareResponse { matches: vec![] });
};

No face in target = no matches returned.

Partial Occlusion

YuNet handles partial face occlusion (sunglasses, masks, hands) reasonably well, but:

If more than 50% of the face is occluded, detection may fail or produce low-quality landmarks for alignment.

Profile Views

The model is trained primarily on frontal faces (±45° yaw). Extreme profile views (side faces) may:

Not be detected
Produce poor alignment
Result in low-quality embeddings

Performance Characteristics

Speed

On modern CPUs:

Small images (320x240): ~10-20ms
Medium images (640x480): ~30-50ms
Large images (1920x1080): ~100-150ms

Detection time scales with image resolution. Downscale high-resolution images before processing for faster response times.

Memory Usage

Per detection:

Model weights: ~1.2 MB (loaded once at startup)
Input image: Depends on resolution (e.g., 640x480x3 = ~900 KB)
Output faces Mat: Minimal (15 floats × number of faces)

Debugging Detection Issues

If faces aren’t being detected:

Check Image Quality

Ensure minimum resolution of 160x160 pixels and clear facial features

Verify Face Visibility

At least eyes and nose should be visible and unoccluded

Test Angles

Keep face within ±45° rotation on all axes

Lighting Conditions

Avoid extreme shadows or overexposure that obscure features

Model Comparison

Model	Speed	Accuracy	Size	Best For
YuNet	⚡⚡⚡ Fast	96%	1.2 MB	Production APIs, edge devices
MTCNN	⚡⚡ Medium	97%	2.3 MB	High accuracy needs
RetinaFace	⚡ Slow	98%	27 MB	Research, offline processing

YuNet strikes the best balance for real-time API use cases.

Get Started

Setup

Core Concepts

Security & Privacy

Face Detection

Overview

YuNet Architecture

Technical Specifications

Input Size

Detection Threshold

NMS Threshold

Max Faces

Initialization

Parameter Breakdown

Detection Process

Step 1: Dynamic Input Sizing

Step 2: Run Detection

Output Format

Face Selection Strategy

Why Only One Face?

Face Alignment

Why Alignment Matters

Rotation Invariance

Consistent Scale

Feature Focus

Better Embeddings

Edge Cases

No Face Detected

Partial Occlusion

Profile Views

Performance Characteristics

Speed

Memory Usage

Debugging Detection Issues

Model Comparison

Build docs developers (and LLMs) love

Get Started

Setup

Core Concepts

Security & Privacy

​Overview

​YuNet Architecture

​Technical Specifications

Input Size

Detection Threshold

NMS Threshold

Max Faces

​Initialization

​Parameter Breakdown

​Detection Process

​Step 1: Dynamic Input Sizing

​Step 2: Run Detection

​Output Format

​Face Selection Strategy

​Why Only One Face?

​Face Alignment

​Why Alignment Matters

Rotation Invariance

Consistent Scale

Feature Focus

Better Embeddings

​Edge Cases

​No Face Detected

​Partial Occlusion

​Profile Views

​Performance Characteristics

​Speed

​Memory Usage

​Debugging Detection Issues

​Model Comparison

Build docs developers (and LLMs) love

Overview

YuNet Architecture

Technical Specifications

Initialization

Parameter Breakdown

Detection Process

Step 1: Dynamic Input Sizing

Step 2: Run Detection

Output Format

Face Selection Strategy

Why Only One Face?

Face Alignment

Why Alignment Matters

Edge Cases

No Face Detected

Partial Occlusion

Profile Views

Performance Characteristics

Speed

Memory Usage

Debugging Detection Issues

Model Comparison