Skip to main content
Facial analysis encompasses a family of computer vision tasks centered on the human face: detecting faces in images, verifying identity, and estimating social attributes like age, gender, and emotion. Modern deep learning pipelines perform all of these in a single end-to-end system.

Facial analysis tasks

TaskDescriptionOutput
DetectionLocate faces in an imageBounding boxes
AlignmentNormalize face crop to canonical poseAligned face image
RecognitionMatch a face to a known identityIdentity or embedding
VerificationDecide if two faces are the same personYes / No + confidence
Attribute analysisEstimate age, gender, emotionLabels or continuous values

Face recognition pipeline

A robust face recognition system follows four stages:
1

Detection

Detect all faces in the input image. Common detectors: MTCNN, RetinaFace, BlazeFace. Each returns a bounding box and facial landmark coordinates.
2

Alignment

Use the five-point landmarks (two eyes, nose, two mouth corners) to apply a similarity transform, producing a standardized 112×112 crop. Alignment is critical — misaligned faces significantly hurt recognition accuracy.
3

Embedding

Pass the aligned crop through a deep network (ResNet-50 or -100 backbone) to produce a compact feature vector, typically 512-dimensional. Similar faces produce embeddings close together in 2\ell_2 distance.
4

Matching

Compare the query embedding against a gallery of known embeddings using cosine similarity or 2\ell_2 distance:similarity(f1,f2)=f1f2f1f2\text{similarity}(\mathbf{f}_1, \mathbf{f}_2) = \frac{\mathbf{f}_1 \cdot \mathbf{f}_2}{\|\mathbf{f}_1\| \|\mathbf{f}_2\|}If the similarity exceeds a threshold τ\tau, the identities match.

Deep face recognition: ArcFace and AdaFace

Classical softmax training does not enforce tight clustering of same-identity embeddings. Margin-based loss functions explicitly push the decision boundary closer to each class center.

ArcFace loss

ArcFace adds an angular margin mm to the target class angle: L=1Ni=1Nlogescos(θyi+m)escos(θyi+m)+jyiescosθj\mathcal{L} = -\frac{1}{N} \sum_{i=1}^{N} \log \frac{e^{s \cos(\theta_{y_i} + m)}}{e^{s \cos(\theta_{y_i} + m)} + \sum_{j \neq y_i} e^{s \cos \theta_j}} where ss is a scaling factor and θyi\theta_{y_i} is the angle between the embedding and the target class weight vector.

AdaFace

AdaFace introduces an adaptive margin that scales with image quality. Low-quality images (blurry, occluded) receive a smaller margin, preventing the loss from forcing incorrect gradients on hard-to-recognize samples.
# Simplified AdaFace-based face recognition
import torch
import net  # AdaFace model definition
from face_alignment import align  # alignment helper

# Load pretrained AdaFace model
model = net.build_model('ir_50')
statedict = torch.load('adaface_ir50_ms1mv2.ckpt',
                       map_location='cpu')['state_dict']
model.load_state_dict(statedict)
model.eval()

def get_embedding(img_path):
    """Return 512-d L2-normalized embedding for a face image."""
    aligned = align.get_aligned_face(img_path)  # detect + align
    tensor  = torch.tensor(aligned).permute(2,0,1).unsqueeze(0).float() / 255.0
    with torch.no_grad():
        emb, _ = model(tensor)
    return torch.nn.functional.normalize(emb, p=2, dim=1)

emb1 = get_embedding('person_a.jpg')
emb2 = get_embedding('person_b.jpg')
similarity = (emb1 * emb2).sum().item()
print(f"Cosine similarity: {similarity:.4f}")

Social attribute analysis

Beyond identity, we can estimate social attributes from face crops:

Age estimation

Age is typically framed as a regression or ordinal regression problem. The model predicts a continuous age value from the face embedding: a^=fage(e)\hat{a} = f_{\text{age}}(\mathbf{e}) Mean absolute error (MAE) is the standard evaluation metric.

Gender classification

Binary classification (male/female) on top of face embeddings. Modern models achieve >95% accuracy on benchmark datasets, though they can exhibit demographic bias — an important ethical consideration.

Emotion recognition

Seven basic emotion categories (angry, disgust, fear, happy, neutral, sad, surprised) are predicted from face crops, often using lightweight CNNs fine-tuned on the AffectNet or RAF-DB datasets.
Social attribute models carry significant ethical risks, including demographic bias and potential misuse. When applying these methods, consider fairness, consent, and the limitations of automated attribute inference.

Resources

AdaFace Basic Example

Step-by-step Colab notebook for face recognition using AdaFace.

Exercise E07: Facial Analysis

Hands-on exercise covering face detection, alignment, and recognition.

VisionColab: Facial Analysis

Collection of facial analysis examples from the course repository.

Video: Facial Analysis Lecture (2021)

Recorded lecture on facial analysis techniques and applications.

Build docs developers (and LLMs) love