The endpoints on this page are the JSON interfaces that Signia’s frontend JavaScript calls during live webcam capture and text translation. All recognition endpoints sit under theDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/jtapieromalambo-ctrl/Signia/llms.txt
Use this file to discover all available pages before exploring further.
/reconocimientos/ URL prefix and are decorated with @csrf_exempt, so no CSRF token is required when calling them directly. Rate-limiting is enforced at the session level rather than via token buckets: the hand-detection endpoint uses an in-process throttle (~8 fps per session key), while the prediction endpoints rely on the Django worker pool for concurrency control. All endpoints expect and return UTF-8 JSON unless noted otherwise.
The recognition model must be trained before
/predecir/ and /predecir_landmarks/ will respond. If the model files are absent, both endpoints return HTTP 503. Train the model via the admin panel or the Admin API before calling these endpoints.Sign Recognition
Predict from raw frames
/reconocimientos/predecir/HandLandmarker on it in the current worker thread’s thread-local detector instance. Only frames in which at least one hand is detected contribute to the predicted sequence. A minimum of 5 valid (hand-containing) frames is required; fewer returns an empty result rather than an error.
Request body — application/json
Array of base64-encoded image strings. Each element is a JPEG or PNG frame captured from the webcam. The
data:image/...;base64, prefix is accepted and stripped automatically.The recognized LSC sign gloss in uppercase (e.g.
"HOLA"). Empty string when fewer than 5 hand frames were detected.Confidence score as a percentage (0–100, one decimal place).
0 when the result is empty.| Status | Condition |
|---|---|
503 | Model not yet trained |
405 | Request method is not POST |
500 | Unhandled exception during inference |
Predict from pre-computed landmarks
/reconocimientos/predecir_landmarks/secuencia must contain 126 floats representing two hands: 21 landmarks × 3 coordinates (x, y, z) × 2 hands. If only one hand is visible, pad the second hand’s 63 values with zeros. The server normalises each frame to its centroid and resamples the sequence to 30 frames before running the RandomForest classifier.
A minimum of 5 frames in secuencia is required; fewer returns an empty result.
Request body — application/json
Array of landmark frames. Each element is an array of 126 floats: 21 landmarks × 3 coordinates (x, y, z) for hand 1 followed by the same 63 values for hand 2. If only one hand is present, fill the second hand’s 63 positions with
0.0.Recognised LSC sign gloss (e.g.
"GRACIAS"). Empty string if the sequence is too short or recognition fails.Classifier confidence as a percentage (0–100, one decimal place).
| Status | Condition |
|---|---|
503 | Model not yet trained |
405 | Request method is not POST |
500 | Unhandled exception during inference |
Detect hand presence
/reconocimientos/detectar_mano/{"hay_mano": false, "throttled": true} immediately without running MediaPipe.
Request body — multipart/form-data
A single webcam frame as a base64-encoded string. The
data:image/...;base64, prefix is accepted and stripped automatically.true if MediaPipe detected at least one hand in the frame; false otherwise.Present and set to
true only when the response was short-circuited by the rate limiter. Absent on normal (non-throttled) responses.Text and Audio Translation
Translate text or audio to LSC
/traductor/traductor.html) populated with the corresponding LSC sign video sequence. Internally the endpoint passes the input through the LSC grammar layer (lsc_grammar.py), which calls the Groq API to reorder tokens into LSC gloss order, then looks up each token in the translator video library.
This endpoint returns a full HTML page, not a JSON object. It is designed for browser form submissions and progressive enhancement. For programmatic LSC lookup, use the admin video library endpoints to manage the vocabulary and drive your own rendering.
multipart/form-data
A Spanish word or phrase to translate into LSC sign videos. Mutually exclusive with
audio.A recorded audio clip in WebM format. The server transcribes it with Whisper (
faster-whisper, base model, Spanish) and then processes the transcript identically to palabra. Mutually exclusive with palabra.traduccion/traductor.html template. The template context includes:
| Variable | Type | Description |
|---|---|---|
resultados | list | VideoTraductor model instances whose signs matched the LSC token sequence |
tokens_lsc | list | Ordered list of LSC gloss tokens as determined by the grammar layer |
faltantes | list | Tokens for which no matching video was found in the database |
lsc_metadata | dict | Sentence type, suggested facial expression, and grammar notes from the IA |
modelo_usado | string | Identifier of the Groq model that processed the request, or "fallback" |