Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/jtapieromalambo-ctrl/Signia/llms.txt

Use this file to discover all available pages before exploring further.

lsc_grammar.py is a standalone Python module at the project root that converts Spanish text into the grammatical token order used by Colombian Sign Language (LSC). It does not depend on Django — it can be imported from any Python context — but within Signia it is called by traduccion/views.py for every translation request. The module uses the Groq LLM API with a four-model fallback chain and a rule-based safety net so that a grammar result is always returned even when all remote models are unavailable.

Module Location

lsc_grammar.py   ← project root (same level as manage.py)

Import

from lsc_grammar import convertir_a_lsc, tokens_para_busqueda

Public Functions

convertir_a_lsc

def convertir_a_lsc(
    texto_espanol: str,
    vocabulario_disponible: list[str] | None = None,
) -> dict:
    ...
Converts a Spanish sentence to the LSC gloss order via the Groq API.
ParameterTypeDescription
texto_espanolstrThe Spanish sentence to convert.
vocabulario_disponiblelist[str] | NoneNames of sign videos available in the database. When provided, the AI compares against this list and populates estrategia_faltantes for tokens it cannot find.
Return value — a dict with the following keys:
KeyTypeDescription
tokenslist[dict]Ordered LSC gloss tokens. Each dict has {"word": str, "type": str}.
sentence_typestrSentence category: declarative, question_yn, question_wh, negative, conditional, exclamative, or greeting.
facial_expressionstrDominant non-manual marker: neutral, cejas_arriba, cejas_fruncidas, intensidad, negacion, afirmacion, or condicional.
faltanteslist[str]Tokens not found in the provided vocabulary — signs that have no matching video in the database.
notesstrOptional linguistic observation from the model (verb directionality, regional variants, etc.).
estrategia_faltantesdictMaps each missing token to a handling strategy — synonym:ALTERNATIVE, spell, fingerspell, record, or context.
modelo_usadostrName of the Groq model that produced the result, or "fallback" if the rule-based path was used.
errorstr | NonePresent (non-None) only when all Groq models failed and the rule-based fallback was activated.

tokens_para_busqueda

def tokens_para_busqueda(resultado_lsc: dict) -> list[str]:
    ...
Extracts the plain searchable token strings from the result of convertir_a_lsc, stripping non-searchable token types (facial, aspect, topic).
ParameterTypeDescription
resultado_lscdictThe dict returned by convertir_a_lsc.
Returnslist[str] of uppercase gloss tokens in LSC order, ready for database lookup.

LSC Grammar Rules

The module encodes a detailed LSC grammar in its system prompt, based on the work of Alejandro Oviedo (2001) and the INSOR/Caro y Cuervo Basic LSC Dictionary. Key rules applied during conversion:

SOV Word Order

The canonical LSC sentence order is Subject → Object → Verb. Example: "Yo como arroz"YO ARROZ COMER.

Temporal Markers First

Time markers always open the sentence before the subject. Example: "Mañana voy al médico"MAÑANA YO MEDICO IR.

Tópico-Comentario

The main topic is placed first and marked with [TOPIC] when there is explicit contrast or emphasis. Copula verbs are dropped when they carry no lexical meaning.

Spatial Context First

In greetings that include an explicit place or institution, the spatial context precedes the idea and the greeting. Example: "Hola, bienvenidos al SENA"SENA BIENVENIDOS HOLA.
Additional rules cover negation (always final), Wh-questions (pronoun moved to end + [EF:CEJAS_FRUNCIDAS]), modal verb placement (after main verb), non-manual facial expression tokens ([EF:...]), and multi-word expressions joined with underscores (POR_FAVOR, BUENOS_DIAS).

Usage Example

from lsc_grammar import convertir_a_lsc, tokens_para_busqueda

vocabulary = ["SENA", "BIENVENIDOS", "BUENOS_DIAS", "HOLA", "GRACIAS"]

result = convertir_a_lsc("Buenos días, bienvenidos al SENA", vocabulary)
# result["tokens"] →
# [
#   {"word": "SENA",        "type": "other"},
#   {"word": "BIENVENIDOS", "type": "other"},
#   {"word": "BUENOS_DIAS", "type": "greeting"},
# ]
# result["sentence_type"]    → "greeting"
# result["facial_expression"] → "neutral"
# result["modelo_usado"]      → "llama-3.3-70b-versatile"

searchable = tokens_para_busqueda(result)
# → ["SENA", "BIENVENIDOS", "BUENOS_DIAS"]

Four-Model Groq Fallback Chain

The module attempts four Groq models in order. On a 429 (rate limit), 503 (overload), or model-decommissioned error it moves immediately to the next model. Any other error (authentication failure, network error) causes an immediate fall-through to the rule-based fallback without trying remaining models.
PriorityModel IDRole
1llama-3.3-70b-versatilePrimary — highest LSC grammar quality
2llama-3.1-8b-instantBackup 1 — fastest, independent rate-limit quota
3llama3-8b-8192Backup 2 — Llama 3 base, highly stable
4llama3-70b-8192Backup 3 — Llama 3 70B base
If all four models fail, _fallback_sin_ia() applies a heuristic rule set: it strips articles, prepositions, and copula verbs; sorts tokens into time → subject → rest → negation → Wh-question order; and sets result["error"] to a descriptive message so callers can surface a warning to the user.
The modelo_usado field in the return dict always tells you which path was taken. A value of "fallback" means the rule-based path ran. Any Groq model name means the LLM produced the result.

Environment Variable

The module reads the Groq API key from the environment at first use:
GROQ_API_KEY=gsk_...
In Django, settings.py sets os.environ['GROQ_API_KEY'] from the GROQ_API_KEY python-decouple config variable at startup. If the key is absent, the _get_client() function raises EnvironmentError and the module falls back to rule-based processing.

Vocabulary Caching

traduccion/views.py wraps the vocabulary DB query in a Django cache call before passing it to convertir_a_lsc:
def _obtener_vocabulario_bd() -> list[str]:
    from django.core.cache import cache
    vocab = cache.get('vocabulario_lsc')
    if vocab is not None:
        return vocab
    vocab = list(video.objects.values_list('nombre', flat=True))
    cache.set('vocabulario_lsc', vocab, 600)  # 10 minutes
    return vocab
The cache key is 'vocabulario_lsc' with a TTL of 600 seconds (10 minutes). After adding or removing sign videos, the stale vocabulary will persist until the cache expires or is manually invalidated.
To force an immediate vocabulary refresh without restarting the server, call cache.delete('vocabulario_lsc') from a Django shell or management command.

Build docs developers (and LLMs) love