Real-Time LSC Sign Language Recognition via Webcam

Signia’s sign recognition feature lets deaf or non-speaking users communicate by signing in front of their webcam and receiving an instant text translation. The system captures video frames in the browser, extracts hand landmark coordinates using MediaPipe, and classifies the gesture sequence with a RandomForest model trained on sign videos uploaded through the admin panel. The result is a low-latency, server-side prediction that runs without any GPU requirement.

How It Works

User opens /reconocimientos/camara/

The camara view renders the recognition interface (usuarios/reconocimiento.html), which activates the browser’s webcam. Users with discapacidad='sordo' or 'mudo' are automatically redirected here after login.

Browser captures video frames

JavaScript captures frames from the <video> element at regular intervals. Frames are encoded as base64 data URLs before being sent to the server, or — in the faster landmark pipeline — processed locally first by MediaPipe JS.

MediaPipe JS extracts hand landmarks client-side

The preferred path uses MediaPipe’s JavaScript HandLandmarker running directly in the browser (via WASM). It detects up to 2 hands and extracts 21 3D landmarks per hand, producing a flat array of up to 126 floats ([x0,y0,z0, ..., x20,y20,z20] × 2 hands). This offloads the most expensive CV work to the client.

Landmarks sent to /reconocimientos/predecir_landmarks/

The browser POSTs the accumulated landmark sequence (a list of up to 30 frames, each 126 floats) as JSON to the predecir_landmarks endpoint. This avoids sending raw image data over the network and skips server-side MediaPipe processing entirely.

Server normalizes and runs RandomForest prediction

The server normalizes each frame to its hand centroid (translation-invariant features), resamples the sequence to exactly 30 frames, builds a feature vector of flattened positions + frame deltas + magnitudes, and calls modelo.predict().

Result shown to user

The endpoint returns {"seña": "HOLA", "confianza": 94.3}. The frontend displays the detected sign and confidence score in real time.

Prediction Endpoints

The recognition app exposes two complementary endpoints depending on where landmark extraction happens:

predecir_landmarks (recommended)
predecir (server-side MediaPipe)

URL: POST /reconocimientos/predecir_landmarks/The client sends pre-computed landmarks from MediaPipe JS, bypassing server-side computer vision entirely. This is the primary recognition path — faster and more scalable.

{
  "secuencia": [
    [0.42, 0.31, -0.02, ...],
    [0.43, 0.30, -0.01, ...]
  ]
}

Each row contains 126 floats: 21 landmarks × 3 coordinates × 2 hands. If only one hand is present, the second 63 values are 0.0. The server normalizes each row to its centroid, resamples to 30 frames, and runs inference.

URL: POST /reconocimientos/predecir/The client sends a list of raw base64-encoded JPEG frames. The server decodes each frame, downscales it to a maximum width of 320 px, runs the server-side HandLandmarker, extracts landmarks, normalizes to centroid, and then follows the same RandomForest path.

{
  "frames": ["data:image/jpeg;base64,...", "..."]
}

This path requires fewer client-side dependencies but uses significantly more server CPU and bandwidth. It is provided as a fallback for environments where MediaPipe WASM cannot run.

Both endpoints return the same response shape:

{
  "seña": "GRACIAS",
  "confianza": 87.5
}

If fewer than 5 valid frames are received, both endpoints return {"seña": "", "confianza": 0} without running inference.

MediaPipe HandLandmarker

The server-side HandLandmarker is configured with:

HandLandmarkerOptions(
    base_options=BaseOptions(model_asset_path=LANDMARKER_PATH),
    running_mode=RunningMode.IMAGE,
    num_hands=2,
    min_hand_detection_confidence=0.3,
    min_tracking_confidence=0.3,
)

It detects up to 2 hands simultaneously, with a low detection confidence threshold of 0.3 to increase recall in varied lighting conditions. Each detected hand yields 21 3D landmarks (x, y, z), giving a maximum of 42 points (126 floats) per frame.

Thread Safety

HandLandmarker is not thread-safe. Sharing a single instance across Django worker threads causes deadlocks and dropped frames. Signia solves this with threading.local():

_thread_local = threading.local()

def _get_detector():
    if not hasattr(_thread_local, 'detector'):
        _thread_local.detector = HandLandmarker.create_from_options(_mp_options)
    return _thread_local.detector

Each Django worker thread gets its own independent HandLandmarker instance, created on first use and reused for the lifetime of that thread.

RandomForest Classifier

The classifier is a sklearn.ensemble.RandomForestClassifier trained with:

RandomForestClassifier(
    n_estimators=500,
    max_depth=None,
    min_samples_leaf=1,
    min_samples_split=3,
    max_features='sqrt',
    class_weight='balanced',
    random_state=42,
    n_jobs=-1,
)

Feature Engineering

Prediction uses a three-part feature vector constructed by construir_features():

Component	Shape	Description
Positions	`30 × 126` flattened	Centroid-normalized landmark coordinates across all 30 frames
Deltas	`29 × 126` flattened	Frame-to-frame differences — captures motion direction
Magnitudes	`29` values	L2 norm of each delta row — captures movement speed

Centroid Normalization

Before building features, each frame’s landmarks are shifted by the hand’s centroid, making predictions invariant to screen position:

def _normalizar_landmarks_centroide(puntos):
    mano1 = np.array(puntos[:63]).reshape(21, 3)
    centroide1 = mano1.mean(axis=0)
    mano1 = mano1 - centroide1
    # Second hand normalized separately if present
    ...

Sequence Normalization

Incoming sequences of any length are resampled to exactly 30 frames using linear interpolation (np.interp), so the classifier always receives a fixed-size input regardless of how fast the user signed.

Training Data Augmentation

During training, each uploaded sign video is augmented into up to 8 variations per sample to improve generalization:

Gaussian Noise

Small (σ=0.008) and larger (σ=0.018) noise levels simulate natural hand tremor.

Scale Variation

Uniform scaling between 0.93–1.07× simulates varying camera distances.

Speed Variation

Sequences are resampled to a random length (20–45 frames) to handle different signing speeds.

Horizontal Mirror

X-coordinates are flipped (x → 1.0 - x) to handle left- and right-handed signers.

Translation

Random X/Y offsets (±0.08) simulate the signer not being centered in the frame.

Temporal Reversal

The frame sequence is reversed to improve robustness to symmetrical gestures.

Request Throttling

To prevent a single active session from saturating the server at 30+ requests per second, the detectar_mano endpoint enforces a 120 ms minimum interval per session key:

_THROTTLE_MS = 0.12  # 120 ms → ~8 fps maximum

def _puede_detectar(session_key: str) -> bool:
    ahora = time.monotonic()
    with _throttle_lock:
        ultima = _throttle_last.get(session_key, 0)
        if ahora - ultima < _THROTTLE_MS:
            return False
        _throttle_last[session_key] = ahora
        return True

Throttled requests return {"hay_mano": false, "throttled": true} immediately. Stale entries older than 60 seconds are pruned when the dictionary grows beyond 500 keys.

Hand Presence Detection

The detectar_mano endpoint (POST /reconocimientos/detectar_mano/) is a lightweight presence check that runs at up to 8 fps. It decodes a single base64 frame, runs the HandLandmarker, and returns a boolean:

{ "hay_mano": true }

The frontend uses this endpoint to decide when to start accumulating frames for a full prediction — avoiding unnecessary landmark extraction and prediction calls when no hands are visible.

Disability Routing

The redirigir_por_discapacidad() function in usuarios/views.py automatically routes users to the recognition interface based on their profile:

def redirigir_por_discapacidad(user):
    if user.discapacidad in ['sordo', 'mudo']:
        return redirect('reconocimiento')
    else:
        return redirect('traduccion')

Users with discapacidad='sordo' (deaf) are routed to /reconocimiento/ because they communicate by signing; the recognition feature translates their signs into text that hearing users can read. Users marked 'mudo' (non-speaking) follow the same path for the same reason. This routing is applied on login, post-OTP verification, and after OAuth disability selection.

The RandomForest model must be trained through the admin panel before the recognition endpoints will work. If reconocimientos/modelo/model_seq.pkl does not exist on disk, both predecir and predecir_landmarks return a 503 response:

{ "error": "Modelo no entrenado aún. Entrena desde el panel admin." }

Upload at least one labeled VideoSeña record per sign class through /admin-videos/ and trigger training before enabling the recognition interface.

Get Started

Core Features

Configuration

Deployment

Real-Time LSC Sign Language Recognition via Webcam

How It Works

Prediction Endpoints

MediaPipe HandLandmarker

Thread Safety

RandomForest Classifier

Feature Engineering

Centroid Normalization

Sequence Normalization

Training Data Augmentation

Gaussian Noise

Scale Variation

Speed Variation

Horizontal Mirror

Translation

Temporal Reversal

Request Throttling

Hand Presence Detection

Disability Routing

Build docs developers (and LLMs) love

Get Started

Core Features

Configuration

Deployment

Documentation Index

​How It Works

​Prediction Endpoints

​MediaPipe HandLandmarker

​Thread Safety

​RandomForest Classifier

​Feature Engineering

​Centroid Normalization

​Sequence Normalization

​Training Data Augmentation

Gaussian Noise

Scale Variation

Speed Variation

Horizontal Mirror

Translation

Temporal Reversal

​Request Throttling

​Hand Presence Detection

​Disability Routing

Build docs developers (and LLMs) love

How It Works

Prediction Endpoints

MediaPipe HandLandmarker

Thread Safety

RandomForest Classifier

Feature Engineering

Centroid Normalization

Sequence Normalization

Training Data Augmentation

Request Throttling

Hand Presence Detection

Disability Routing