NLP Pipeline: Spanish Text to Propositional Logic IR

TautoTeacher 2.0 can take a raw Spanish sentence such as “si llueve entonces llevo paraguas” and automatically produce the propositional formula (llover → llevar paraguas). The transformation runs entirely through deterministic, rule-based code — no machine learning, no external services. A raw sentence enters the pipeline, passes through four sequential stages, and exits as a structured LogicExpr IR node ready for formula emission or tautology checking.

Text Normalization — NormalizadorTexto

NormalizadorTexto (package tautoteacher2.nlp.normalizacion) prepares the raw input for downstream stages. It lowercases the text, strips combining diacritical marks (accents) using Unicode NFD decomposition, normalizes semicolons and colons to commas, and collapses all runs of whitespace to a single space. The result is a clean, accent-free, lower-case string that every later stage can rely on.

Lexical Tokenization — NaturalLexer

NaturalLexer (package tautoteacher2.nlp.lexer) scans the normalized text left-to-right. It checks for keyword tokens first — multi-word connectives like "si y solo si " and "en caso de que " are matched before shorter ones like "si " — ensuring no keyword is accidentally absorbed into an adjacent literal. Text that is not a keyword becomes a LITERAL token representing a proposition fragment (e.g., "llueve" or "llevo paraguas").

Morphological Normalization — NormalizadorMorfologico + BaseConocimiento

NormalizadorMorfologico (package tautoteacher2.nlp.lexicon), coordinated by BaseConocimiento, reduces conjugated verb forms to their canonical infinitives so that llueve, llueva, and llovió all map to the same proposition label llover. It works in two layers: an explicit lemma table loaded from core.lgs handles irregular verbs and exceptions, while suffix rules handle regular conjugations. The identity fallback is used when neither layer matches.

Semantic Mapping — SemanticMapper

SemanticMapper (package tautoteacher2.nlp.semantica) receives the token list and matches it against an ordered catalogue of PatronSemanticoLgs patterns. Each pattern describes an exact token-type sequence (e.g., SI LITERAL ENTONCES LITERAL) and maps it to an IR node type (IMP, AND, OR, EQUIV, etc.). The matched tokens’ lexemas are canonicalized by BaseConocimiento and wrapped in AtomExpr leaves. The final result is a LogicExpr IR tree that EmitidorFormula can render as a Unicode formula string.

End-to-End Example

The following table traces the sentence "si llueve entonces llevo paraguas" through every pipeline stage.

Stage	Output
Raw input	`si llueve entonces llevo paraguas`
After NormalizadorTexto	`si llueve entonces llevo paraguas`
After NaturalLexer	`[SI("si"), LITERAL("llueve"), ENTONCES("entonces"), LITERAL("llevo paraguas")]`
Pattern matched	`SI_ENTONCES` → IR type `IMP`
After canonicalization	antecedent `llueve → llover`, consequent `llevo paraguas → llevar paraguas`
IR tree	`ImpExpr(AtomExpr("llover"), AtomExpr("llevar paraguas"))`
Emitted formula	`(llover → llevar paraguas)`

The example shows why morphological normalization matters: without the lemma entries for llueve and llevo, the proposition labels would be the raw conjugated forms, making formula comparisons fragile across different phrasings of the same idea.

Explore the Pipeline Stages

Natural Lexer

Token types, keyword priority, and the NaturalLexer scanning algorithm.

Morphological Normalizer

Suffix rules, lemma tables, and the two-layer canonicalization strategy.

Semantic Mapper

Named patterns, IR node types, and fallback behavior.

LogicScript Core

The core.lgs file format: lemma, lexrule, and pattern directives.

Get Started

Core Logic Engine

Natural Language Processing

LogicScript

User Interface

NLP Pipeline: Spanish Text to Propositional Logic IR

End-to-End Example

Explore the Pipeline Stages

Natural Lexer

Morphological Normalizer

Semantic Mapper

LogicScript Core

Build docs developers (and LLMs) love

Get Started

Core Logic Engine

Natural Language Processing

LogicScript

User Interface

Documentation Index

​End-to-End Example

​Explore the Pipeline Stages

Natural Lexer

Morphological Normalizer

Semantic Mapper

LogicScript Core

Build docs developers (and LLMs) love

End-to-End Example

Explore the Pipeline Stages