Spanish morphological inflection is a direct challenge for proposition labeling:Documentation Index
Fetch the complete documentation index at: https://mintlify.com/AngelMoralesChazari/TautoTeacher-2.0/llms.txt
Use this file to discover all available pages before exploring further.
llueve, llueva, and llovió are all conjugations of the same verb llover, yet a naïve pipeline would assign each form its own distinct proposition symbol. NormalizadorMorfologico (package tautoteacher2.nlp.lexicon), working together with BaseConocimiento, handles this by mapping every conjugated form back to its infinitive before a symbol is assigned, so that all three forms produce the same label llover and therefore the same AtomExpr node in the IR. The irregular stems (llueve, llovió) are resolved through explicit lemma entries in core.lgs; regular conjugations (e.g., estudio, estudian) are handled by NormalizadorMorfologico’s suffix rules.
Three-Layer Approach
Canonicalization of a single word goes through three ordered steps insideBaseConocimiento.canonicalizarPalabra().
Layer 1 — Lemma table (highest priority). The BaseConocimiento class loads explicit lemma entries from core.lgs at startup. These cover irregular verbs and special cases where heuristic suffix rules would produce the wrong result: for example llueve → llover (the stem changes from lluev- to llov-), apruebo → aprobar, and solea → hacer_sol (a locution). Lemma lookup is an exact HashMap get — O(1) and unambiguous.
Layer 2 — Suffix rules (lexrule / reglasPredeterminadas()). If no lemma matches, NormalizadorMorfologico.normalizar() walks an ordered list of ReglaMorfologicaLgs rule objects and applies the first one whose suffix matches the end of the word. A rule strips the matched suffix, optionally checks a stem condition, and appends the infinitive vowel + "r" to produce a candidate infinitive. Rules are checked longest-suffix-first so that longer endings like -amos are not shadowed by -s.
Layer 3 — Identity fallback. If neither a lemma nor any suffix rule produces a result, the word is returned unchanged. The semantic pipeline still proceeds normally; the proposition label simply uses the conjugated form as written.
Priority Order in BaseConocimiento.canonicalizarPalabra()
-ar, -er, or -ir (minimum length 4) are treated as infinitives and returned immediately without further processing.
Suffix Rules Table
The default rules embedded inreglasPredeterminadas() are listed below. Rules are applied in the order shown; the first match wins.
| Suffix(es) | Infinitive vowel | Example |
|---|---|---|
-iaria | a | estudiaría → estudiar |
-aria | a | hablaría → hablar |
-iaba | a | estudiaba (alt.) → estudiar |
-abas | a | estudiabas → estudiar |
-aban | a | estudiaban → estudiar |
-amos | a | estudiamos → estudiar |
-ais | a | estudiáis → estudiar |
-aron | a | estudiaron → estudiar |
-aran | a | estudiaran → estudiar |
-are | a | estudiare → estudiar |
-aba | a | estudiaba → estudiar |
-ado | a | estudiado → estudiar |
-ada | a | estudiada → estudiar |
-emos | e | corremos → correr |
-eis | e | correís → correr |
-imos | i | vivimos → vivir |
-an | a | estudian → estudiar |
-as | a | estudias → estudiar |
-a | a | estudia → estudiar |
-o | heuristic | estudio → estudiar |
First-Person Heuristic (-o Suffix)
The last rule in the default list — suffix -o — uses a different strategy because the first-person singular present can belong to -ar, -er, or -ir verbs and there is no morphological signal in the ending alone. The infinitivoDesdePrimeraPersona() helper inspects the stem:
- Stem ends in a vowel → append
ar(e.g., estudio → estudiar, stem estudi-). - Stem ends in
rm,mm,rc, orrt→ appendir(e.g., duermo → dormir, stem duerm-). - All other consonant-final stems → append
aras the most common case (e.g., llego → llegar, descanso → descansar).
-ar verbs correctly and catches several common -ir verbs via the special-stem list. Irregular verbs with vowel-changing stems (such as duerme → dormir, where the stem is duerm- rather than dorm-) still require an explicit lemma entry in core.lgs.
Exclusion List
Certain domain nouns end in vowels or common suffixes that would otherwise be misidentified as verb forms. The defaultexclusionesPredeterminadas() set protects the following words from suffix processing:
gorra, sombrero, paraguas, calor, frio, sol, cielo, nube, nubes, lluvia, examen, clase
Without this list, gorra would be incorrectly normalized to gorrar by the -a suffix rule.
Public Method Signatures
NormalizadorTexto directly: lower-case, accent-free strings. A null or blank input returns "".