LSC Grammar Module: Spanish to Sign Language Conversion

lsc_grammar.py is a standalone Python module at the project root that converts Spanish text into the grammatical token order used by Colombian Sign Language (LSC). It does not depend on Django — it can be imported from any Python context — but within Signia it is called by traduccion/views.py for every translation request. The module uses the Groq LLM API with a four-model fallback chain and a rule-based safety net so that a grammar result is always returned even when all remote models are unavailable.

Module Location

lsc_grammar.py   ← project root (same level as manage.py)

Import

from lsc_grammar import convertir_a_lsc, tokens_para_busqueda

Public Functions

`convertir_a_lsc`

def convertir_a_lsc(
    texto_espanol: str,
    vocabulario_disponible: list[str] | None = None,
) -> dict:
    ...

Converts a Spanish sentence to the LSC gloss order via the Groq API.

Parameter	Type	Description
`texto_espanol`	`str`	The Spanish sentence to convert.
`vocabulario_disponible`	`list[str] \| None`	Names of sign videos available in the database. When provided, the AI compares against this list and populates `estrategia_faltantes` for tokens it cannot find.

Return value — a dict with the following keys:

Key	Type	Description
`tokens`	`list[dict]`	Ordered LSC gloss tokens. Each dict has `{"word": str, "type": str}`.
`sentence_type`	`str`	Sentence category: `declarative`, `question_yn`, `question_wh`, `negative`, `conditional`, `exclamative`, or `greeting`.
`facial_expression`	`str`	Dominant non-manual marker: `neutral`, `cejas_arriba`, `cejas_fruncidas`, `intensidad`, `negacion`, `afirmacion`, or `condicional`.
`faltantes`	`list[str]`	Tokens not found in the provided vocabulary — signs that have no matching video in the database.
`notes`	`str`	Optional linguistic observation from the model (verb directionality, regional variants, etc.).
`estrategia_faltantes`	`dict`	Maps each missing token to a handling strategy — `synonym:ALTERNATIVE`, `spell`, `fingerspell`, `record`, or `context`.
`modelo_usado`	`str`	Name of the Groq model that produced the result, or `"fallback"` if the rule-based path was used.
`error`	`str \| None`	Present (non-`None`) only when all Groq models failed and the rule-based fallback was activated.

`tokens_para_busqueda`

def tokens_para_busqueda(resultado_lsc: dict) -> list[str]:
    ...

Extracts the plain searchable token strings from the result of convertir_a_lsc, stripping non-searchable token types (facial, aspect, topic).

Parameter	Type	Description
`resultado_lsc`	`dict`	The dict returned by `convertir_a_lsc`.

Returns — list[str] of uppercase gloss tokens in LSC order, ready for database lookup.

LSC Grammar Rules

The module encodes a detailed LSC grammar in its system prompt, based on the work of Alejandro Oviedo (2001) and the INSOR/Caro y Cuervo Basic LSC Dictionary. Key rules applied during conversion:

SOV Word Order

The canonical LSC sentence order is Subject → Object → Verb. Example: "Yo como arroz" → YO ARROZ COMER.

Temporal Markers First

Time markers always open the sentence before the subject. Example: "Mañana voy al médico" → MAÑANA YO MEDICO IR.

Tópico-Comentario

The main topic is placed first and marked with [TOPIC] when there is explicit contrast or emphasis. Copula verbs are dropped when they carry no lexical meaning.

Spatial Context First

In greetings that include an explicit place or institution, the spatial context precedes the idea and the greeting. Example: "Hola, bienvenidos al SENA" → SENA BIENVENIDOS HOLA.

Additional rules cover negation (always final), Wh-questions (pronoun moved to end + [EF:CEJAS_FRUNCIDAS]), modal verb placement (after main verb), non-manual facial expression tokens ([EF:...]), and multi-word expressions joined with underscores (POR_FAVOR, BUENOS_DIAS).

Usage Example

from lsc_grammar import convertir_a_lsc, tokens_para_busqueda

vocabulary = ["SENA", "BIENVENIDOS", "BUENOS_DIAS", "HOLA", "GRACIAS"]

result = convertir_a_lsc("Buenos días, bienvenidos al SENA", vocabulary)
# result["tokens"] →
# [
#   {"word": "SENA",        "type": "other"},
#   {"word": "BIENVENIDOS", "type": "other"},
#   {"word": "BUENOS_DIAS", "type": "greeting"},
# ]
# result["sentence_type"]    → "greeting"
# result["facial_expression"] → "neutral"
# result["modelo_usado"]      → "llama-3.3-70b-versatile"

searchable = tokens_para_busqueda(result)
# → ["SENA", "BIENVENIDOS", "BUENOS_DIAS"]

Four-Model Groq Fallback Chain

The module attempts four Groq models in order. On a 429 (rate limit), 503 (overload), or model-decommissioned error it moves immediately to the next model. Any other error (authentication failure, network error) causes an immediate fall-through to the rule-based fallback without trying remaining models.

Priority	Model ID	Role
1	`llama-3.3-70b-versatile`	Primary — highest LSC grammar quality
2	`llama-3.1-8b-instant`	Backup 1 — fastest, independent rate-limit quota
3	`llama3-8b-8192`	Backup 2 — Llama 3 base, highly stable
4	`llama3-70b-8192`	Backup 3 — Llama 3 70B base

If all four models fail, _fallback_sin_ia() applies a heuristic rule set: it strips articles, prepositions, and copula verbs; sorts tokens into time → subject → rest → negation → Wh-question order; and sets result["error"] to a descriptive message so callers can surface a warning to the user.

The modelo_usado field in the return dict always tells you which path was taken. A value of "fallback" means the rule-based path ran. Any Groq model name means the LLM produced the result.

Environment Variable

The module reads the Groq API key from the environment at first use:

GROQ_API_KEY=gsk_...

In Django, settings.py sets os.environ['GROQ_API_KEY'] from the GROQ_API_KEY python-decouple config variable at startup. If the key is absent, the _get_client() function raises EnvironmentError and the module falls back to rule-based processing.

Vocabulary Caching

traduccion/views.py wraps the vocabulary DB query in a Django cache call before passing it to convertir_a_lsc:

def _obtener_vocabulario_bd() -> list[str]:
    from django.core.cache import cache
    vocab = cache.get('vocabulario_lsc')
    if vocab is not None:
        return vocab
    vocab = list(video.objects.values_list('nombre', flat=True))
    cache.set('vocabulario_lsc', vocab, 600)  # 10 minutes
    return vocab

The cache key is 'vocabulario_lsc' with a TTL of 600 seconds (10 minutes). After adding or removing sign videos, the stale vocabulary will persist until the cache expires or is manually invalidated.

To force an immediate vocabulary refresh without restarting the server, call cache.delete('vocabulario_lsc') from a Django shell or management command.

Architecture

API Endpoints

LSC Grammar Module: Spanish to Sign Language Conversion

Module Location

Import

Public Functions

`convertir_a_lsc`

`tokens_para_busqueda`

LSC Grammar Rules

SOV Word Order

Temporal Markers First

Tópico-Comentario

Spatial Context First

Usage Example

Four-Model Groq Fallback Chain

Environment Variable

Vocabulary Caching

Build docs developers (and LLMs) love

Architecture

API Endpoints

Documentation Index

​Module Location

​Import

​Public Functions

​convertir_a_lsc

​tokens_para_busqueda

​LSC Grammar Rules

SOV Word Order

Temporal Markers First

Tópico-Comentario

Spatial Context First

​Usage Example

​Four-Model Groq Fallback Chain

​Environment Variable

​Vocabulary Caching

Build docs developers (and LLMs) love

Module Location

Import

Public Functions

`convertir_a_lsc`

`tokens_para_busqueda`

LSC Grammar Rules

Usage Example

Four-Model Groq Fallback Chain

Environment Variable

Vocabulary Caching