RandomForest Sign Recognition Model: Training and Inference

Signia’s hand-sign recognition pipeline uses a scikit-learn RandomForestClassifier to map sequences of MediaPipe hand landmarks to LSC sign labels. The model is trained on video files uploaded through the admin panel, enriched with eight data augmentation variations per source video, and then combined with any previously accumulated dataset so that each training run is incremental rather than destructive to historical signs. This page documents the model files, feature engineering pipeline, training procedure, and the thread-safety guarantees that govern inference.

Model and Data Files

Path	Description
`reconocimientos/modelo/model_seq.pkl`	Serialized `RandomForestClassifier` — loaded at Django startup
`reconocimientos/modelo/encoder_seq.pkl`	Serialized `LabelEncoder` — maps integer predictions back to sign label strings
`reconocimientos/datos/X_seq.npy`	Accumulated feature matrix (NumPy array, `object` dtype to handle variable feature lengths across training runs)
`reconocimientos/datos/y_seq.npy`	Accumulated label array corresponding to `X_seq.npy`
`reconocimientos/datos/hand_landmarker.task`	MediaPipe HandLandmarker model file used by the server-side detector during training

If model_seq.pkl or encoder_seq.pkl is missing when the Django process starts, the module sets modelo = None and encoder = None. Every call to /reconocimientos/predecir/ or /reconocimientos/predecir_landmarks/ will return HTTP 503 until a model is trained via the admin panel.

Model Hyperparameters

The classifier is instantiated with the following fixed hyperparameters in reconocimientos/views.py:

from sklearn.ensemble import RandomForestClassifier

nuevo_modelo = RandomForestClassifier(
    n_estimators=500,
    max_depth=None,
    min_samples_leaf=1,
    min_samples_split=3,
    max_features='sqrt',
    class_weight='balanced',
    random_state=42,
    n_jobs=-1,
)

class_weight='balanced' compensates for classes with fewer training samples. n_jobs=-1 uses all available CPU cores during fitting and prediction.

Training Process

Training is triggered by a POST request to /reconocimientos/admin-videos/entrenar/ from the admin panel. The entire process runs in a background daemon thread to avoid blocking the HTTP response.

Upload sign videos

Navigate to /reconocimientos/admin-videos/ and upload one or more sign videos using the Subir video de seña form. Each upload creates a VideoSeña record in the database with a label (the sign name) and the video file stored under media/video_señas/.

Trigger training

Click Entrenar modelo in the admin panel. The server spawns a daemon thread and immediately returns {"ok": true}. Poll /reconocimientos/estado-entrenamiento/ to check {"activo": true|false}.

Extract landmarks with MediaPipe

For each VideoSeña, the thread opens the video file with OpenCV, reads every frame, and passes each frame through the thread-local MediaPipe HandLandmarker. Frames where no hand is detected are skipped. If fewer than 5 frames have detectable hands, the video is skipped entirely.

Centroid normalization

Each frame’s raw landmark list (up to 126 floats — 21 landmarks × 3 coordinates × 2 hands) is passed to _normalizar_landmarks_centroide(). The function subtracts the centroid of each hand’s 21 points, making the features invariant to where on-screen the hands appear.

Generate 8 augmentation variations

aumentar_secuencia() takes the normalized frame sequence and produces 8 variations. See the Data Augmentation section below.

Normalize to 30 frames

Each variation is passed through normalizar_secuencia(), which uses numpy.interp to linearly resample the sequence to exactly FRAMES_OBJETIVO = 30 frames regardless of the original video length.

Build feature vectors

construir_features() concatenates: flattened normalized positions (30 × 126 floats), frame-to-frame deltas (29 × 126 floats), and delta magnitudes (29 floats) into a single feature vector per sample.

Combine with accumulated dataset

If X_seq.npy and y_seq.npy already exist, the new samples are appended to the historical data. Feature lengths are reconciled by zero-padding shorter vectors to match the widest vector in the combined set.

Train RandomForest

A new RandomForestClassifier is fitted on the full combined dataset with a fresh LabelEncoder. Training uses all CPU cores (n_jobs=-1).

Save model and dataset

model_seq.pkl, encoder_seq.pkl, X_seq.npy, and y_seq.npy are written to disk. The in-memory modelo and encoder module globals are updated atomically so that inference immediately picks up the new model.

Delete processed videos

All VideoSeña records are deleted from the database and their corresponding files are removed from disk.

Training permanently deletes every VideoSeña record and file after processing. This is by design — the knowledge is encoded in X_seq.npy/y_seq.npy, not the source videos. If you need to retain source videos for audit or re-use, back them up before triggering a training run.

Feature Extraction Pipeline

Video frames
    └─ OpenCV decode → BGR→RGB → MediaPipe HandLandmarker
          └─ up to 2 hands × 21 landmarks × 3 coords = 126 floats/frame
                └─ _normalizar_landmarks_centroide()   [subtract per-hand centroid]
                      └─ normalizar_secuencia(n=30)     [linear interp → 30 frames]
                            └─ construir_features()
                                  ├─ flattened positions:  30 × 126 = 3 780 floats
                                  ├─ frame deltas:         29 × 126 = 3 654 floats
                                  └─ delta magnitudes:     29       = 29   floats
                                        → final feature vector

The constant FRAMES_OBJETIVO = 30 is defined at the top of reconocimientos/views.py and is used consistently during both training and inference.

Data Augmentation

aumentar_secuencia() returns a list of 8 NumPy arrays from a single source sequence:

#	Variation	Implementation
1	Original	Source sequence unchanged
2	Gaussian noise (small)	`+ np.random.normal(0, 0.008, shape)` — simulates natural hand tremor
3	Scale	`× uniform(0.93, 1.07)` — simulates distance variation from the camera
4	Speed variation	Resample to a random length of 20–45 frames — simulates signing at different speeds
5	Horizontal mirror	Invert every X coordinate (`x = 1.0 - x`) — simulates left-handed signers
6	Random translation	Add `dx, dy ∈ uniform(-0.08, 0.08)` to X and Y — simulates hand position shift
7	Stronger Gaussian noise	`+ np.random.normal(0, 0.018, shape)` — robustness against intense tremor
8	Temporal reverse	`sequence[::-1]` — useful for temporal symmetry in symmetric signs

Incremental Training

Each training run merges new videos with the existing accumulated dataset:

# Simplified excerpt from the training thread in reconocimientos/views.py
if os.path.exists(DATASET_X_PATH) and os.path.exists(DATASET_Y_PATH):
    X_prev = np.load(DATASET_X_PATH, allow_pickle=True).tolist()
    y_prev = np.load(DATASET_Y_PATH, allow_pickle=True).tolist()
    X_data = X_prev_norm + X_nuevos   # historical + new
    y_data = list(y_prev) + y_nuevos

This means you can add new signs incrementally — existing signs already in the accumulated dataset are retained and re-trained alongside the new data. To remove a sign from the model without a full retrain, use the DELETE /reconocimientos/sena/<nombre>/ endpoint, which filters the sign from X_seq.npy/y_seq.npy and retrains automatically.

Effectiveness Calculation

The admin panel displays a per-sign effectiveness percentage. This is not cross-validation accuracy — it is computed from the leaf purity of the trained RandomForest estimators and then linearly mapped to the range 55 %–99 %:

def _calcular_senas_entrenadas():
    for est in modelo.estimators_:
        tree = est.tree_
        values = tree.value[:, 0, :]     # (n_nodes, n_classes)
        es_hoja = tree.children_left == -1
        for nodo_idx in np.where(es_hoja)[0]:
            clase_ganadora = int(np.argmax(values[nodo_idx]))
            pureza_por_clase[clase_ganadora] += (
                values[nodo_idx][clase_ganadora] / values[nodo_idx].sum()
            )
    # Map raw pureza_media → [55, 99] range
    efectividad = 55 + ((pureza - pmin) / rango) * 44

A sign with 99 % effectiveness has near-perfect leaf purity across all estimators. A sign at 55 % is the least pure class relative to the others in the current model — it may need more training examples.

Thread Safety

MediaPipe HandLandmarker is not thread-safe. reconocimientos/views.py uses threading.local() to maintain one detector instance per worker thread. The _get_detector() helper creates a new HandLandmarker the first time any given thread calls it, then reuses that instance for all subsequent calls on the same thread.

_thread_local = threading.local()

def _get_detector():
    if not hasattr(_thread_local, 'detector'):
        _thread_local.detector = HandLandmarker.create_from_options(_mp_options)
    return _thread_local.detector

The _entrenando boolean flag prevents two concurrent training runs from interfering with each other:

@csrf_exempt
@require_http_methods(["POST"])
def entrenar_modelo(request):
    global _entrenando
    if _entrenando:
        return JsonResponse({'ok': False, 'error': 'Ya hay un entrenamiento en curso'})
    # spawn daemon thread ...

Because training runs inside a daemon thread in the same Gunicorn worker process, restarting or replacing the worker during an active training run will silently abort it. Check {"activo": false} from the status endpoint before redeploying.

Architecture

API Endpoints

RandomForest Sign Recognition Model: Training and Inference

Model and Data Files

Model Hyperparameters

Training Process

Feature Extraction Pipeline

Data Augmentation

Incremental Training

Effectiveness Calculation

Thread Safety

Build docs developers (and LLMs) love

Architecture

API Endpoints

Documentation Index

​Model and Data Files

​Model Hyperparameters

​Training Process

​Feature Extraction Pipeline

​Data Augmentation

​Incremental Training

​Effectiveness Calculation

​Thread Safety

Build docs developers (and LLMs) love

Model and Data Files

Model Hyperparameters

Training Process

Feature Extraction Pipeline

Data Augmentation

Incremental Training

Effectiveness Calculation

Thread Safety