Translate Spanish Text and Audio into LSC Sign Videos

Signia’s text-to-signs pipeline bridges spoken Spanish and Colombian Sign Language by combining speech transcription, AI-powered grammar reordering, and a database of recorded sign videos. When a user submits a sentence — whether typed or spoken — the system restructures it according to LSC’s Subject–Object–Verb grammar and plays back the matching sign videos in sequence, enabling fluid, linguistically accurate communication.

How It Works

Submit text or audio at /traductor/

Users can either type a Spanish sentence into the text field or record audio directly in the browser. Both inputs converge into the same processing pipeline.

Audio transcription with faster-whisper

If the user submits audio, the uploaded .webm file is passed to a lazily-loaded faster-whisper model (base, device="cpu", compute_type="int8"). Whisper transcribes the audio with language='es' and beam_size=5, producing a plain-text Spanish sentence. The temporary file is deleted from disk after transcription.

LSC grammar conversion via Groq

The transcribed or typed text is sent to lsc_grammar.convertir_a_lsc(), which calls the Groq API with a detailed LSC linguistic system prompt. The model reorders the input into LSC’s SOV structure and returns a structured JSON list of gloss tokens.

Video lookup for each LSC token

tokens_para_busqueda() extracts the word tokens from the Groq response, filtering out non-lexical markers like facial expressions [EF:...] and aspect markers [ASP:...]. Each token is looked up in the video model using a case-insensitive exact match on the nombre field. Multi-word expressions (e.g. CON_GUSTO) are tried first as compound phrases before falling back to individual tokens.

Videos played in LSC sequence

Matched video objects are collected in LSC order and their URLs passed to the template as urls_videos. The frontend plays them sequentially, beginning with the base idle animation and then each sign in order.

LSC Grammar Conversion

LSC uses SOV (Subject–Object–Verb) word order, which differs fundamentally from Spanish’s SVO structure. The lsc_grammar.py module handles this reordering through a 15-module linguistic system prompt grounded in Alejandro Oviedo’s (2001) LSC research and the INSOR/Caro y Cuervo dictionary. Key grammatical transformations applied:

Rule	Description	Example
SOV order	Subject first, then object, then verb	`Yo como arroz` → `YO ARROZ COMER`
Temporal markers first	Time expressions precede the subject	`Mañana voy al médico` → `MAÑANA YO MEDICO IR`
Tópico-comentario	Topic/theme leads; marked with `[TOPIC]`	`El carro rojo, yo lo compré` → `CARRO ROJO [TOPIC] YO COMPRAR`
Negation at end	`NO` always follows the verb (and modal)	`No puedo ir` → `YO IR PODER NO`
WH-questions at end	Interrogative pronoun moves to final position	`¿Cómo te llamas?` → `TU LLAMAR COMO [EF:CEJAS_FRUNCIDAS]`
Copula deletion	Empty `ser`/`estar` is dropped	`Él es médico` → `EL MEDICO`
Modals after verb	Modal verbs follow the main verb	`Quiero salir` → `YO SALIR QUERER`

A complete example of temporal marker placement:

Input (Spanish):  "Mañana voy al médico"
LSC tokens:       MAÑANA  YO    MEDICO  IR
Token types:      time    subj  obj     verb

The Groq response also includes sentence_type (e.g. declarative, question_wh), facial_expression metadata, and notes with linguistic observations. Non-lexical tokens — [EF:CEJAS_FRUNCIDAS], [ASP:COMPLETADO], [TOPIC] — are stripped before the database lookup.

Fallback Chain

If the primary Groq model is unavailable or rate-limited, lsc_grammar.py automatically tries four models in order:

MODELOS_GROQ = [
    "llama-3.3-70b-versatile",   # Primary: best LSC quality
    "llama-3.1-8b-instant",      # Fallback 1: faster, independent quota
    "llama3-8b-8192",            # Fallback 2: stable Llama 3 base
    "llama3-70b-8192",           # Fallback 3: Llama 3 70B base
]

If all Groq models are exhausted, a local rule-based heuristic (_fallback_sin_ia) applies basic LSC ordering (time → subject → rest → negation → question). The view’s modelo_usado context variable is set to 'fallback' so the template can detect and surface this condition to the user.

Audio Input

Audio is captured in the browser as .webm and uploaded via multipart/form-data. The file is saved to a temporary directory (temp/) with a UUID filename, then processed:

model = WhisperModel("base", device="cpu", compute_type="int8")
segments, info = model.transcribe(ruta, language='es', beam_size=5)
text = " ".join(segment.text for segment in segments)

The faster-whisper model is loaded lazily on first use (guarded by threading.Lock()), so it does not block Gunicorn startup. Files smaller than 1,000 bytes are silently skipped to avoid processing empty recordings.

The base model offers a good balance between transcription speed and accuracy for conversational Spanish on CPU. Larger models such as small or medium can be swapped in _get_whisper_model() at the cost of higher memory usage and inference time.

Vocabulary System

Sign videos are stored using the video model in the traduccion app:

class video(models.Model):
    nombre = models.CharField(max_length=100)
    video  = models.FileField(upload_to='videos/')

The nombre field is the lookup key. Tokens from the LSC grammar layer are matched against it with nombre__iexact, so MEDICO, medico, and Médico all resolve to the same record. Multi-word expressions use spaces: the token CON_GUSTO is looked up as "con gusto" after replacing underscores. Admins upload, rename, and delete sign videos through the /admin-videos/ panel, which provides a full CRUD interface (/reconocimientos/traductor/crear/, .../editar/<id>/, .../eliminar/<id>/).

The full vocabulary list is cached in Django’s cache backend under the key vocabulario_lsc for 10 minutes (600 seconds). This avoids a database round-trip on every translation request. When the vocabulary changes (after adding or removing a video), the cache expires naturally or you can clear it manually with cache.delete('vocabulario_lsc').

Fallback for Missing Tokens

When the Groq model identifies a token that has no matching video in the database, it sets a per-token strategy in the estrategia_faltantes dict of the response. The view applies this strategy via _buscar_token_con_fallbacks():

Strategy	Behaviour
`synonym:ALTERNATIVA`	Looks up the alternative token in the database instead
`spell`	Marks the token as missing; frontend can render it as fingerspelling
`record`	Marks the token as a candidate for a new sign recording
`fingerspell`	Short acronym to be fingerspelled character by character

If neither the original token nor its synonym exists in the database, the token is added to the faltantes list, which is passed to the template for display.

Translation History

Every successful translation (at least one video found) is recorded for authenticated users:

EntradaHistorial.objects.create(
    usuario=request.user,
    tipo='traduccion',
    contenido=palabras_texto.strip(),
)

History entries are viewable and filterable at /historial/ and are paginated at 15 items per page.

A video record with nombre="base" (case-insensitive) must exist in the database for the translator template to render the idle baseline animation. If this record is missing, video_base will be None and the frontend will have no starting frame to display. Create this record through the /admin-videos/ panel before the translator is usable.

Get Started

Core Features

Configuration

Deployment

Translate Spanish Text and Audio into LSC Sign Videos

How It Works

LSC Grammar Conversion

Fallback Chain

Audio Input

Vocabulary System

Fallback for Missing Tokens

Translation History

Build docs developers (and LLMs) love

Get Started

Core Features

Configuration

Deployment

Documentation Index

​How It Works

​LSC Grammar Conversion

​Fallback Chain

​Audio Input

​Vocabulary System

​Fallback for Missing Tokens

​Translation History

Build docs developers (and LLMs) love

How It Works

LSC Grammar Conversion

Fallback Chain

Audio Input

Vocabulary System

Fallback for Missing Tokens

Translation History