Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/jtapieromalambo-ctrl/Signia/llms.txt

Use this file to discover all available pages before exploring further.

Signia’s sign recognition feature lets deaf or non-speaking users communicate by signing in front of their webcam and receiving an instant text translation. The system captures video frames in the browser, extracts hand landmark coordinates using MediaPipe, and classifies the gesture sequence with a RandomForest model trained on sign videos uploaded through the admin panel. The result is a low-latency, server-side prediction that runs without any GPU requirement.

How It Works

1

User opens /reconocimientos/camara/

The camara view renders the recognition interface (usuarios/reconocimiento.html), which activates the browser’s webcam. Users with discapacidad='sordo' or 'mudo' are automatically redirected here after login.
2

Browser captures video frames

JavaScript captures frames from the <video> element at regular intervals. Frames are encoded as base64 data URLs before being sent to the server, or — in the faster landmark pipeline — processed locally first by MediaPipe JS.
3

MediaPipe JS extracts hand landmarks client-side

The preferred path uses MediaPipe’s JavaScript HandLandmarker running directly in the browser (via WASM). It detects up to 2 hands and extracts 21 3D landmarks per hand, producing a flat array of up to 126 floats ([x0,y0,z0, ..., x20,y20,z20] × 2 hands). This offloads the most expensive CV work to the client.
4

Landmarks sent to /reconocimientos/predecir_landmarks/

The browser POSTs the accumulated landmark sequence (a list of up to 30 frames, each 126 floats) as JSON to the predecir_landmarks endpoint. This avoids sending raw image data over the network and skips server-side MediaPipe processing entirely.
5

Server normalizes and runs RandomForest prediction

The server normalizes each frame to its hand centroid (translation-invariant features), resamples the sequence to exactly 30 frames, builds a feature vector of flattened positions + frame deltas + magnitudes, and calls modelo.predict().
6

Result shown to user

The endpoint returns {"seña": "HOLA", "confianza": 94.3}. The frontend displays the detected sign and confidence score in real time.

Prediction Endpoints

The recognition app exposes two complementary endpoints depending on where landmark extraction happens: Both endpoints return the same response shape:
{
  "seña": "GRACIAS",
  "confianza": 87.5
}
If fewer than 5 valid frames are received, both endpoints return {"seña": "", "confianza": 0} without running inference.

MediaPipe HandLandmarker

The server-side HandLandmarker is configured with:
HandLandmarkerOptions(
    base_options=BaseOptions(model_asset_path=LANDMARKER_PATH),
    running_mode=RunningMode.IMAGE,
    num_hands=2,
    min_hand_detection_confidence=0.3,
    min_tracking_confidence=0.3,
)
It detects up to 2 hands simultaneously, with a low detection confidence threshold of 0.3 to increase recall in varied lighting conditions. Each detected hand yields 21 3D landmarks (x, y, z), giving a maximum of 42 points (126 floats) per frame.

Thread Safety

HandLandmarker is not thread-safe. Sharing a single instance across Django worker threads causes deadlocks and dropped frames. Signia solves this with threading.local():
_thread_local = threading.local()

def _get_detector():
    if not hasattr(_thread_local, 'detector'):
        _thread_local.detector = HandLandmarker.create_from_options(_mp_options)
    return _thread_local.detector
Each Django worker thread gets its own independent HandLandmarker instance, created on first use and reused for the lifetime of that thread.

RandomForest Classifier

The classifier is a sklearn.ensemble.RandomForestClassifier trained with:
RandomForestClassifier(
    n_estimators=500,
    max_depth=None,
    min_samples_leaf=1,
    min_samples_split=3,
    max_features='sqrt',
    class_weight='balanced',
    random_state=42,
    n_jobs=-1,
)

Feature Engineering

Prediction uses a three-part feature vector constructed by construir_features():
ComponentShapeDescription
Positions30 × 126 flattenedCentroid-normalized landmark coordinates across all 30 frames
Deltas29 × 126 flattenedFrame-to-frame differences — captures motion direction
Magnitudes29 valuesL2 norm of each delta row — captures movement speed

Centroid Normalization

Before building features, each frame’s landmarks are shifted by the hand’s centroid, making predictions invariant to screen position:
def _normalizar_landmarks_centroide(puntos):
    mano1 = np.array(puntos[:63]).reshape(21, 3)
    centroide1 = mano1.mean(axis=0)
    mano1 = mano1 - centroide1
    # Second hand normalized separately if present
    ...

Sequence Normalization

Incoming sequences of any length are resampled to exactly 30 frames using linear interpolation (np.interp), so the classifier always receives a fixed-size input regardless of how fast the user signed.

Training Data Augmentation

During training, each uploaded sign video is augmented into up to 8 variations per sample to improve generalization:

Gaussian Noise

Small (σ=0.008) and larger (σ=0.018) noise levels simulate natural hand tremor.

Scale Variation

Uniform scaling between 0.93–1.07× simulates varying camera distances.

Speed Variation

Sequences are resampled to a random length (20–45 frames) to handle different signing speeds.

Horizontal Mirror

X-coordinates are flipped (x → 1.0 - x) to handle left- and right-handed signers.

Translation

Random X/Y offsets (±0.08) simulate the signer not being centered in the frame.

Temporal Reversal

The frame sequence is reversed to improve robustness to symmetrical gestures.

Request Throttling

To prevent a single active session from saturating the server at 30+ requests per second, the detectar_mano endpoint enforces a 120 ms minimum interval per session key:
_THROTTLE_MS = 0.12  # 120 ms → ~8 fps maximum

def _puede_detectar(session_key: str) -> bool:
    ahora = time.monotonic()
    with _throttle_lock:
        ultima = _throttle_last.get(session_key, 0)
        if ahora - ultima < _THROTTLE_MS:
            return False
        _throttle_last[session_key] = ahora
        return True
Throttled requests return {"hay_mano": false, "throttled": true} immediately. Stale entries older than 60 seconds are pruned when the dictionary grows beyond 500 keys.

Hand Presence Detection

The detectar_mano endpoint (POST /reconocimientos/detectar_mano/) is a lightweight presence check that runs at up to 8 fps. It decodes a single base64 frame, runs the HandLandmarker, and returns a boolean:
{ "hay_mano": true }
The frontend uses this endpoint to decide when to start accumulating frames for a full prediction — avoiding unnecessary landmark extraction and prediction calls when no hands are visible.

Disability Routing

The redirigir_por_discapacidad() function in usuarios/views.py automatically routes users to the recognition interface based on their profile:
def redirigir_por_discapacidad(user):
    if user.discapacidad in ['sordo', 'mudo']:
        return redirect('reconocimiento')
    else:
        return redirect('traduccion')
Users with discapacidad='sordo' (deaf) are routed to /reconocimiento/ because they communicate by signing; the recognition feature translates their signs into text that hearing users can read. Users marked 'mudo' (non-speaking) follow the same path for the same reason. This routing is applied on login, post-OTP verification, and after OAuth disability selection.
The RandomForest model must be trained through the admin panel before the recognition endpoints will work. If reconocimientos/modelo/model_seq.pkl does not exist on disk, both predecir and predecir_landmarks return a 503 response:
{ "error": "Modelo no entrenado aún. Entrena desde el panel admin." }
Upload at least one labeled VideoSeña record per sign class through /admin-videos/ and trigger training before enabling the recognition interface.

Build docs developers (and LLMs) love