Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/jtapieromalambo-ctrl/Signia/llms.txt

Use this file to discover all available pages before exploring further.

Signia’s text-to-signs pipeline bridges spoken Spanish and Colombian Sign Language by combining speech transcription, AI-powered grammar reordering, and a database of recorded sign videos. When a user submits a sentence — whether typed or spoken — the system restructures it according to LSC’s Subject–Object–Verb grammar and plays back the matching sign videos in sequence, enabling fluid, linguistically accurate communication.

How It Works

1

Submit text or audio at /traductor/

Users can either type a Spanish sentence into the text field or record audio directly in the browser. Both inputs converge into the same processing pipeline.
2

Audio transcription with faster-whisper

If the user submits audio, the uploaded .webm file is passed to a lazily-loaded faster-whisper model (base, device="cpu", compute_type="int8"). Whisper transcribes the audio with language='es' and beam_size=5, producing a plain-text Spanish sentence. The temporary file is deleted from disk after transcription.
3

LSC grammar conversion via Groq

The transcribed or typed text is sent to lsc_grammar.convertir_a_lsc(), which calls the Groq API with a detailed LSC linguistic system prompt. The model reorders the input into LSC’s SOV structure and returns a structured JSON list of gloss tokens.
4

Video lookup for each LSC token

tokens_para_busqueda() extracts the word tokens from the Groq response, filtering out non-lexical markers like facial expressions [EF:...] and aspect markers [ASP:...]. Each token is looked up in the video model using a case-insensitive exact match on the nombre field. Multi-word expressions (e.g. CON_GUSTO) are tried first as compound phrases before falling back to individual tokens.
5

Videos played in LSC sequence

Matched video objects are collected in LSC order and their URLs passed to the template as urls_videos. The frontend plays them sequentially, beginning with the base idle animation and then each sign in order.

LSC Grammar Conversion

LSC uses SOV (Subject–Object–Verb) word order, which differs fundamentally from Spanish’s SVO structure. The lsc_grammar.py module handles this reordering through a 15-module linguistic system prompt grounded in Alejandro Oviedo’s (2001) LSC research and the INSOR/Caro y Cuervo dictionary. Key grammatical transformations applied:
RuleDescriptionExample
SOV orderSubject first, then object, then verbYo como arrozYO ARROZ COMER
Temporal markers firstTime expressions precede the subjectMañana voy al médicoMAÑANA YO MEDICO IR
Tópico-comentarioTopic/theme leads; marked with [TOPIC]El carro rojo, yo lo compréCARRO ROJO [TOPIC] YO COMPRAR
Negation at endNO always follows the verb (and modal)No puedo irYO IR PODER NO
WH-questions at endInterrogative pronoun moves to final position¿Cómo te llamas?TU LLAMAR COMO [EF:CEJAS_FRUNCIDAS]
Copula deletionEmpty ser/estar is droppedÉl es médicoEL MEDICO
Modals after verbModal verbs follow the main verbQuiero salirYO SALIR QUERER
A complete example of temporal marker placement:
Input (Spanish):  "Mañana voy al médico"
LSC tokens:       MAÑANA  YO    MEDICO  IR
Token types:      time    subj  obj     verb
The Groq response also includes sentence_type (e.g. declarative, question_wh), facial_expression metadata, and notes with linguistic observations. Non-lexical tokens — [EF:CEJAS_FRUNCIDAS], [ASP:COMPLETADO], [TOPIC] — are stripped before the database lookup.

Fallback Chain

If the primary Groq model is unavailable or rate-limited, lsc_grammar.py automatically tries four models in order:
MODELOS_GROQ = [
    "llama-3.3-70b-versatile",   # Primary: best LSC quality
    "llama-3.1-8b-instant",      # Fallback 1: faster, independent quota
    "llama3-8b-8192",            # Fallback 2: stable Llama 3 base
    "llama3-70b-8192",           # Fallback 3: Llama 3 70B base
]
If all Groq models are exhausted, a local rule-based heuristic (_fallback_sin_ia) applies basic LSC ordering (time → subject → rest → negation → question). The view’s modelo_usado context variable is set to 'fallback' so the template can detect and surface this condition to the user.

Audio Input

Audio is captured in the browser as .webm and uploaded via multipart/form-data. The file is saved to a temporary directory (temp/) with a UUID filename, then processed:
model = WhisperModel("base", device="cpu", compute_type="int8")
segments, info = model.transcribe(ruta, language='es', beam_size=5)
text = " ".join(segment.text for segment in segments)
The faster-whisper model is loaded lazily on first use (guarded by threading.Lock()), so it does not block Gunicorn startup. Files smaller than 1,000 bytes are silently skipped to avoid processing empty recordings.
The base model offers a good balance between transcription speed and accuracy for conversational Spanish on CPU. Larger models such as small or medium can be swapped in _get_whisper_model() at the cost of higher memory usage and inference time.

Vocabulary System

Sign videos are stored using the video model in the traduccion app:
class video(models.Model):
    nombre = models.CharField(max_length=100)
    video  = models.FileField(upload_to='videos/')
The nombre field is the lookup key. Tokens from the LSC grammar layer are matched against it with nombre__iexact, so MEDICO, medico, and Médico all resolve to the same record. Multi-word expressions use spaces: the token CON_GUSTO is looked up as "con gusto" after replacing underscores. Admins upload, rename, and delete sign videos through the /admin-videos/ panel, which provides a full CRUD interface (/reconocimientos/traductor/crear/, .../editar/<id>/, .../eliminar/<id>/).
The full vocabulary list is cached in Django’s cache backend under the key vocabulario_lsc for 10 minutes (600 seconds). This avoids a database round-trip on every translation request. When the vocabulary changes (after adding or removing a video), the cache expires naturally or you can clear it manually with cache.delete('vocabulario_lsc').

Fallback for Missing Tokens

When the Groq model identifies a token that has no matching video in the database, it sets a per-token strategy in the estrategia_faltantes dict of the response. The view applies this strategy via _buscar_token_con_fallbacks():
StrategyBehaviour
synonym:ALTERNATIVALooks up the alternative token in the database instead
spellMarks the token as missing; frontend can render it as fingerspelling
recordMarks the token as a candidate for a new sign recording
fingerspellShort acronym to be fingerspelled character by character
If neither the original token nor its synonym exists in the database, the token is added to the faltantes list, which is passed to the template for display.

Translation History

Every successful translation (at least one video found) is recorded for authenticated users:
EntradaHistorial.objects.create(
    usuario=request.user,
    tipo='traduccion',
    contenido=palabras_texto.strip(),
)
History entries are viewable and filterable at /historial/ and are paginated at 15 items per page.
A video record with nombre="base" (case-insensitive) must exist in the database for the translator template to render the idle baseline animation. If this record is missing, video_base will be None and the frontend will have no starting frame to display. Create this record through the /admin-videos/ panel before the translator is usable.

Build docs developers (and LLMs) love