Overview
Face detection is the first critical step in the recognition pipeline. Before we can identify who someone is, we need to locate where their face appears in the image. Iris uses YuNet, a lightweight CNN-based face detector developed by OpenCV, running on ONNX Runtime for maximum performance.Model:
face_detection_yunet_2023mar.onnx (March 2023 version)YuNet Architecture
YuNet (You Only Need One Network) is specifically designed for:- Speed: Real-time detection on CPU
- Accuracy: 96%+ detection rate on standard benchmarks
- Small footprint: Works efficiently on edge devices
- Multi-scale detection: Finds faces of varying sizes
Technical Specifications
Input Size
320x320 pixels (dynamically resized per image)
Detection Threshold
0.9 confidence score
NMS Threshold
0.3 (Non-Maximum Suppression)
Max Faces
5000 per image
Initialization
The detector is loaded when the FaceEngine initializes (face.rs:10-13):
face.rs
Parameter Breakdown
score_threshold: 0.9
score_threshold: 0.9
Only faces detected with 90%+ confidence are considered valid. This high threshold reduces false positives but may miss faces in challenging conditions (heavy occlusion, extreme angles).
nms_threshold: 0.3
nms_threshold: 0.3
Non-Maximum Suppression removes overlapping bounding boxes. A threshold of 0.3 means boxes with >30% overlap are merged, keeping only the highest-confidence detection.
top_k: 5000
top_k: 5000
Maximum number of candidate detections before NMS. This handles group photos with many faces.
Detection Process
Theget_embedding function in face.rs:21-39 handles face detection:
Step 1: Dynamic Input Sizing
face.rs
Step 2: Run Detection
face.rs
detect() method returns a Mat where:
- Each row represents one detected face
- Columns contain:
[x, y, width, height, x_re, y_re, x_le, y_le, x_n, y_n, x_rm, y_rm, x_lm, y_lm, confidence]
Output Format
Face Selection Strategy
Iris uses a single-face strategy (face.rs:30-37):
face.rs
When multiple faces are detected, only the first face (typically the largest/most confident) is used for recognition.
Why Only One Face?
This design choice simplifies the API: ✅ Predictable behavior: One input image = one embedding✅ Performance: No need to process every face in group photos
✅ Use case alignment: Most applications verify a single person’s identity For multi-face scenarios, clients should crop individual faces before sending to the API.
Face Alignment
Before recognition, the detected face undergoes geometric normalization:face.rs
- Rotates the face to be upright based on eye positions
- Scales to a standard size (112x112 pixels for SFace)
- Crops to remove background and focus on facial features
Why Alignment Matters
Rotation Invariance
Handles tilted heads and angled photos
Consistent Scale
Normalizes faces from various distances
Feature Focus
Removes distracting background elements
Better Embeddings
Leads to more accurate recognition results
Edge Cases
No Face Detected
Whenfaces.rows() == 0, the function returns Ok(None) instead of an error. The caller in main.rs:82-89 handles this gracefully:
main.rs
Partial Occlusion
YuNet handles partial face occlusion (sunglasses, masks, hands) reasonably well, but:Profile Views
The model is trained primarily on frontal faces (±45° yaw). Extreme profile views (side faces) may:- Not be detected
- Produce poor alignment
- Result in low-quality embeddings
Performance Characteristics
Speed
On modern CPUs:- Small images (320x240): ~10-20ms
- Medium images (640x480): ~30-50ms
- Large images (1920x1080): ~100-150ms
Memory Usage
Per detection:- Model weights: ~1.2 MB (loaded once at startup)
- Input image: Depends on resolution (e.g., 640x480x3 = ~900 KB)
- Output faces Mat: Minimal (15 floats × number of faces)
Debugging Detection Issues
If faces aren’t being detected:Model Comparison
| Model | Speed | Accuracy | Size | Best For |
|---|---|---|---|---|
| YuNet | ⚡⚡⚡ Fast | 96% | 1.2 MB | Production APIs, edge devices |
| MTCNN | ⚡⚡ Medium | 97% | 2.3 MB | High accuracy needs |
| RetinaFace | ⚡ Slow | 98% | 27 MB | Research, offline processing |