TautoTeacher 2.0 can take a raw Spanish sentence such as “si llueve entonces llevo paraguas” and automatically produce the propositional formulaDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/AngelMoralesChazari/TautoTeacher-2.0/llms.txt
Use this file to discover all available pages before exploring further.
(llover → llevar paraguas). The transformation runs entirely through deterministic, rule-based code — no machine learning, no external services. A raw sentence enters the pipeline, passes through four sequential stages, and exits as a structured LogicExpr IR node ready for formula emission or tautology checking.
Text Normalization — NormalizadorTexto
NormalizadorTexto (package tautoteacher2.nlp.normalizacion) prepares the raw input for downstream stages. It lowercases the text, strips combining diacritical marks (accents) using Unicode NFD decomposition, normalizes semicolons and colons to commas, and collapses all runs of whitespace to a single space. The result is a clean, accent-free, lower-case string that every later stage can rely on.Lexical Tokenization — NaturalLexer
NaturalLexer (package tautoteacher2.nlp.lexer) scans the normalized text left-to-right. It checks for keyword tokens first — multi-word connectives like "si y solo si " and "en caso de que " are matched before shorter ones like "si " — ensuring no keyword is accidentally absorbed into an adjacent literal. Text that is not a keyword becomes a LITERAL token representing a proposition fragment (e.g., "llueve" or "llevo paraguas").Morphological Normalization — NormalizadorMorfologico + BaseConocimiento
NormalizadorMorfologico (package tautoteacher2.nlp.lexicon), coordinated by BaseConocimiento, reduces conjugated verb forms to their canonical infinitives so that llueve, llueva, and llovió all map to the same proposition label llover. It works in two layers: an explicit lemma table loaded from core.lgs handles irregular verbs and exceptions, while suffix rules handle regular conjugations. The identity fallback is used when neither layer matches.Semantic Mapping — SemanticMapper
SemanticMapper (package tautoteacher2.nlp.semantica) receives the token list and matches it against an ordered catalogue of PatronSemanticoLgs patterns. Each pattern describes an exact token-type sequence (e.g., SI LITERAL ENTONCES LITERAL) and maps it to an IR node type (IMP, AND, OR, EQUIV, etc.). The matched tokens’ lexemas are canonicalized by BaseConocimiento and wrapped in AtomExpr leaves. The final result is a LogicExpr IR tree that EmitidorFormula can render as a Unicode formula string.End-to-End Example
The following table traces the sentence"si llueve entonces llevo paraguas" through every pipeline stage.
| Stage | Output |
|---|---|
| Raw input | si llueve entonces llevo paraguas |
| After NormalizadorTexto | si llueve entonces llevo paraguas |
| After NaturalLexer | [SI("si"), LITERAL("llueve"), ENTONCES("entonces"), LITERAL("llevo paraguas")] |
| Pattern matched | SI_ENTONCES → IR type IMP |
| After canonicalization | antecedent llueve → llover, consequent llevo paraguas → llevar paraguas |
| IR tree | ImpExpr(AtomExpr("llover"), AtomExpr("llevar paraguas")) |
| Emitted formula | (llover → llevar paraguas) |
llueve and llevo, the proposition labels would be the raw conjugated forms, making formula comparisons fragile across different phrasings of the same idea.
Explore the Pipeline Stages
Natural Lexer
Token types, keyword priority, and the
NaturalLexer scanning algorithm.Morphological Normalizer
Suffix rules, lemma tables, and the two-layer canonicalization strategy.
Semantic Mapper
Named patterns, IR node types, and fallback behavior.
LogicScript Core
The
core.lgs file format: lemma, lexrule, and pattern directives.