LSC Sign Recognition and Translation API Endpoints

The endpoints on this page are the JSON interfaces that Signia’s frontend JavaScript calls during live webcam capture and text translation. All recognition endpoints sit under the /reconocimientos/ URL prefix and are decorated with @csrf_exempt, so no CSRF token is required when calling them directly. Rate-limiting is enforced at the session level rather than via token buckets: the hand-detection endpoint uses an in-process throttle (~8 fps per session key), while the prediction endpoints rely on the Django worker pool for concurrency control. All endpoints expect and return UTF-8 JSON unless noted otherwise.

The recognition model must be trained before /predecir/ and /predecir_landmarks/ will respond. If the model files are absent, both endpoints return HTTP 503. Train the model via the admin panel or the Admin API before calling these endpoints.

Sign Recognition

Predict from raw frames

This endpoint runs MediaPipe server-side on each frame, which is significantly slower than /predecir_landmarks/. Prefer /predecir_landmarks/ for production use.

POST

string

/reconocimientos/predecir/

Accepts a sequence of raw camera frames encoded as base64 strings. The server decodes each frame, resizes it to at most 320 px wide, and runs MediaPipe’s HandLandmarker on it in the current worker thread’s thread-local detector instance. Only frames in which at least one hand is detected contribute to the predicted sequence. A minimum of 5 valid (hand-containing) frames is required; fewer returns an empty result rather than an error. Request body — application/json

frames

array

required

Array of base64-encoded image strings. Each element is a JPEG or PNG frame captured from the webcam. The data:image/...;base64, prefix is accepted and stripped automatically.

Response

seña

string

The recognized LSC sign gloss in uppercase (e.g. "HOLA"). Empty string when fewer than 5 hand frames were detected.

confianza

number

Confidence score as a percentage (0–100, one decimal place). 0 when the result is empty.

Example response

{ "seña": "HOLA", "confianza": 94.3 }

Empty result (insufficient hand frames)

{ "seña": "", "confianza": 0 }

Error responses

Status	Condition
`503`	Model not yet trained
`405`	Request method is not POST
`500`	Unhandled exception during inference

Predict from pre-computed landmarks

POST

string

/reconocimientos/predecir_landmarks/

The preferred inference endpoint. The client computes hand landmarks in the browser using MediaPipe JS and sends only the numeric arrays, eliminating server-side image decoding and MediaPipe processing entirely. This is substantially faster and reduces server CPU load. Each element of secuencia must contain 126 floats representing two hands: 21 landmarks × 3 coordinates (x, y, z) × 2 hands. If only one hand is visible, pad the second hand’s 63 values with zeros. The server normalises each frame to its centroid and resamples the sequence to 30 frames before running the RandomForest classifier. A minimum of 5 frames in secuencia is required; fewer returns an empty result. Request body — application/json

secuencia

array

required

Array of landmark frames. Each element is an array of 126 floats: 21 landmarks × 3 coordinates (x, y, z) for hand 1 followed by the same 63 values for hand 2. If only one hand is present, fill the second hand’s 63 positions with 0.0.

Response

seña

string

Recognised LSC sign gloss (e.g. "GRACIAS"). Empty string if the sequence is too short or recognition fails.

confianza

number

Classifier confidence as a percentage (0–100, one decimal place).

Example response

{ "seña": "GRACIAS", "confianza": 87.6 }

Error responses

Status	Condition
`503`	Model not yet trained
`405`	Request method is not POST
`500`	Unhandled exception during inference

JavaScript example

const response = await fetch('/reconocimientos/predecir_landmarks/', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({ secuencia: landmarksArray })
});
const data = await response.json();
console.log(data.seña, data.confianza);

Detect hand presence

POST

string

/reconocimientos/detectar_mano/

A lightweight endpoint that reports whether a hand is visible in a single frame. Signia’s frontend polls this endpoint to decide when to start buffering frames for a full prediction. To avoid saturating the server, responses are throttled to approximately 8 requests per second per session (120 ms minimum interval). Requests that arrive faster than this limit receive {"hay_mano": false, "throttled": true} immediately without running MediaPipe. Request body — multipart/form-data

frame

string

required

A single webcam frame as a base64-encoded string. The data:image/...;base64, prefix is accepted and stripped automatically.

Response

hay_mano

boolean

true if MediaPipe detected at least one hand in the frame; false otherwise.

throttled

boolean

Present and set to true only when the response was short-circuited by the rate limiter. Absent on normal (non-throttled) responses.

Normal response

{ "hay_mano": true }

Throttled response

{ "hay_mano": false, "throttled": true }

Text and Audio Translation

Translate text or audio to LSC

POST

string

/traductor/

Accepts either written Spanish text or a recorded audio clip and returns an HTML page (traductor.html) populated with the corresponding LSC sign video sequence. Internally the endpoint passes the input through the LSC grammar layer (lsc_grammar.py), which calls the Groq API to reorder tokens into LSC gloss order, then looks up each token in the translator video library.

This endpoint returns a full HTML page, not a JSON object. It is designed for browser form submissions and progressive enhancement. For programmatic LSC lookup, use the admin video library endpoints to manage the vocabulary and drive your own rendering.

Request body — multipart/form-data

palabra

string

A Spanish word or phrase to translate into LSC sign videos. Mutually exclusive with audio.

audio

file

A recorded audio clip in WebM format. The server transcribes it with Whisper (faster-whisper, base model, Spanish) and then processes the transcript identically to palabra. Mutually exclusive with palabra.

Response An HTML page rendering the traduccion/traductor.html template. The template context includes:

Variable	Type	Description
`resultados`	list	`VideoTraductor` model instances whose signs matched the LSC token sequence
`tokens_lsc`	list	Ordered list of LSC gloss tokens as determined by the grammar layer
`faltantes`	list	Tokens for which no matching video was found in the database
`lsc_metadata`	dict	Sentence type, suggested facial expression, and grammar notes from the IA
`modelo_usado`	string	Identifier of the Groq model that processed the request, or `"fallback"`

When the Groq API is unavailable, the LSC grammar layer falls back to a rule-based token extractor. The modelo_usado field in the template context will be "fallback" in that case.

Architecture

API Endpoints

LSC Sign Recognition and Translation API Endpoints

Sign Recognition

Predict from raw frames

Predict from pre-computed landmarks

Detect hand presence

Text and Audio Translation

Translate text or audio to LSC

Build docs developers (and LLMs) love

Architecture

API Endpoints

Documentation Index

​Sign Recognition

​Predict from raw frames

​Predict from pre-computed landmarks

​Detect hand presence

​Text and Audio Translation

​Translate text or audio to LSC

Build docs developers (and LLMs) love

Sign Recognition

Predict from raw frames

Predict from pre-computed landmarks

Detect hand presence

Text and Audio Translation

Translate text or audio to LSC