Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/jtapieromalambo-ctrl/Signia/llms.txt

Use this file to discover all available pages before exploring further.

The endpoints on this page are the JSON interfaces that Signia’s frontend JavaScript calls during live webcam capture and text translation. All recognition endpoints sit under the /reconocimientos/ URL prefix and are decorated with @csrf_exempt, so no CSRF token is required when calling them directly. Rate-limiting is enforced at the session level rather than via token buckets: the hand-detection endpoint uses an in-process throttle (~8 fps per session key), while the prediction endpoints rely on the Django worker pool for concurrency control. All endpoints expect and return UTF-8 JSON unless noted otherwise.
The recognition model must be trained before /predecir/ and /predecir_landmarks/ will respond. If the model files are absent, both endpoints return HTTP 503. Train the model via the admin panel or the Admin API before calling these endpoints.

Sign Recognition

Predict from raw frames

This endpoint runs MediaPipe server-side on each frame, which is significantly slower than /predecir_landmarks/. Prefer /predecir_landmarks/ for production use.
POST
string
/reconocimientos/predecir/
Accepts a sequence of raw camera frames encoded as base64 strings. The server decodes each frame, resizes it to at most 320 px wide, and runs MediaPipe’s HandLandmarker on it in the current worker thread’s thread-local detector instance. Only frames in which at least one hand is detected contribute to the predicted sequence. A minimum of 5 valid (hand-containing) frames is required; fewer returns an empty result rather than an error. Request bodyapplication/json
frames
array
required
Array of base64-encoded image strings. Each element is a JPEG or PNG frame captured from the webcam. The data:image/...;base64, prefix is accepted and stripped automatically.
Response
seña
string
The recognized LSC sign gloss in uppercase (e.g. "HOLA"). Empty string when fewer than 5 hand frames were detected.
confianza
number
Confidence score as a percentage (0–100, one decimal place). 0 when the result is empty.
Example response
{ "seña": "HOLA", "confianza": 94.3 }
Empty result (insufficient hand frames)
{ "seña": "", "confianza": 0 }
Error responses
StatusCondition
503Model not yet trained
405Request method is not POST
500Unhandled exception during inference

Predict from pre-computed landmarks

POST
string
/reconocimientos/predecir_landmarks/
The preferred inference endpoint. The client computes hand landmarks in the browser using MediaPipe JS and sends only the numeric arrays, eliminating server-side image decoding and MediaPipe processing entirely. This is substantially faster and reduces server CPU load. Each element of secuencia must contain 126 floats representing two hands: 21 landmarks × 3 coordinates (x, y, z) × 2 hands. If only one hand is visible, pad the second hand’s 63 values with zeros. The server normalises each frame to its centroid and resamples the sequence to 30 frames before running the RandomForest classifier. A minimum of 5 frames in secuencia is required; fewer returns an empty result. Request bodyapplication/json
secuencia
array
required
Array of landmark frames. Each element is an array of 126 floats: 21 landmarks × 3 coordinates (x, y, z) for hand 1 followed by the same 63 values for hand 2. If only one hand is present, fill the second hand’s 63 positions with 0.0.
Response
seña
string
Recognised LSC sign gloss (e.g. "GRACIAS"). Empty string if the sequence is too short or recognition fails.
confianza
number
Classifier confidence as a percentage (0–100, one decimal place).
Example response
{ "seña": "GRACIAS", "confianza": 87.6 }
Error responses
StatusCondition
503Model not yet trained
405Request method is not POST
500Unhandled exception during inference
JavaScript example
const response = await fetch('/reconocimientos/predecir_landmarks/', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({ secuencia: landmarksArray })
});
const data = await response.json();
console.log(data.seña, data.confianza);

Detect hand presence

POST
string
/reconocimientos/detectar_mano/
A lightweight endpoint that reports whether a hand is visible in a single frame. Signia’s frontend polls this endpoint to decide when to start buffering frames for a full prediction. To avoid saturating the server, responses are throttled to approximately 8 requests per second per session (120 ms minimum interval). Requests that arrive faster than this limit receive {"hay_mano": false, "throttled": true} immediately without running MediaPipe. Request bodymultipart/form-data
frame
string
required
A single webcam frame as a base64-encoded string. The data:image/...;base64, prefix is accepted and stripped automatically.
Response
hay_mano
boolean
true if MediaPipe detected at least one hand in the frame; false otherwise.
throttled
boolean
Present and set to true only when the response was short-circuited by the rate limiter. Absent on normal (non-throttled) responses.
Normal response
{ "hay_mano": true }
Throttled response
{ "hay_mano": false, "throttled": true }

Text and Audio Translation

Translate text or audio to LSC

POST
string
/traductor/
Accepts either written Spanish text or a recorded audio clip and returns an HTML page (traductor.html) populated with the corresponding LSC sign video sequence. Internally the endpoint passes the input through the LSC grammar layer (lsc_grammar.py), which calls the Groq API to reorder tokens into LSC gloss order, then looks up each token in the translator video library.
This endpoint returns a full HTML page, not a JSON object. It is designed for browser form submissions and progressive enhancement. For programmatic LSC lookup, use the admin video library endpoints to manage the vocabulary and drive your own rendering.
Request bodymultipart/form-data
palabra
string
A Spanish word or phrase to translate into LSC sign videos. Mutually exclusive with audio.
audio
file
A recorded audio clip in WebM format. The server transcribes it with Whisper (faster-whisper, base model, Spanish) and then processes the transcript identically to palabra. Mutually exclusive with palabra.
Response An HTML page rendering the traduccion/traductor.html template. The template context includes:
VariableTypeDescription
resultadoslistVideoTraductor model instances whose signs matched the LSC token sequence
tokens_lsclistOrdered list of LSC gloss tokens as determined by the grammar layer
faltanteslistTokens for which no matching video was found in the database
lsc_metadatadictSentence type, suggested facial expression, and grammar notes from the IA
modelo_usadostringIdentifier of the Groq model that processed the request, or "fallback"
When the Groq API is unavailable, the LSC grammar layer falls back to a rule-based token extractor. The modelo_usado field in the template context will be "fallback" in that case.

Build docs developers (and LLMs) love