Origin Parser: Recursive-Descent Grammar Reference

The Origin parser (parser.py) transforms the flat list[Token] produced by lex() into a structured Abstract Syntax Tree (AST). It is a hand-written, single-pass recursive-descent parser with no external grammar tool dependency. Every syntactic form in Origin — from simple arithmetic expressions to class definitions and parallel {} blocks — is handled by a dedicated method on the Parser class. The resulting AST is a tree of node instances from classes.py, rooted at a ProgramNode, which the Interpreter then walks to emit Python source.

`Parser(tokens)`

Instantiate the parser by passing the token list returned by lex():

from lexer import lex
from parser import Parser

with open("main.or") as f:
    lines = [l.rstrip("\n") for l in f]

tokens = lex(lines)
parser = Parser(tokens)
ast    = parser.program()   # -> ProgramNode

The constructor stores tokens and sets self.pos = 0. The parser advances pos by consuming tokens through eat().

Top-Level Entry Points

`parser.program() -> ProgramNode`

Parses a complete Origin program. Calls statement() in a loop until the current token is EOF, then returns a ProgramNode whose .statements list contains every top-level ASTNode. This is the method you call to parse a full .or file.

`parser.statement() -> ASTNode`

Parses a single statement. It first calls skip_newlines() to discard leading blank lines, records the current token’s line number for error reporting, then delegates to the internal _statement() dispatcher. The line number is attached to the returned node via _set_line(node, line).

`parser.block() -> BlockNode`

Parses a braced block: { statement* }. Calls skip_newlines(), consumes the opening { bracket, then repeatedly calls statement() until the closing } is reached. Returns a BlockNode containing the collected statements.

# Example: parsing a block manually
block_node = parser.block()   # expects current token to be '{'
# block_node.statements -> list[ASTNode]

Internal Token Helpers

`eat(type_) -> Token`

Consumes and returns the current token when its type matches type_. Advances self.pos by one. Raises SyntaxError if the types don’t match:

SyntaxError(f"Expected {type_}, got {tok.type} ({tok.value}) at {tok.line}:{tok.col}")

`skip_newlines()`

Skips zero or more consecutive NEWLINE tokens. Origin is not whitespace-sensitive within blocks — newlines are optional statement separators and are stripped before each statement parse.

Expression Precedence

Expressions are parsed through a layered call chain. Each level can call the next-higher-precedence level, so the chain enforces standard operator precedence without an explicit precedence table. From lowest to highest:

Level	Method	Operators handled
1 (lowest)	`special_expr()`	`??`, `->`, `=>`, `<=>`, `::`
2	`logic()`	`&&`, `\|\|`, `and`, `or`
3	`comparison()`	`===`, `!==`, `==`, `!=`, `<`, `>`, `<=`, `>=`, `<>`
4	`expr()`	`+`, `-`
5	`term()`	``, `/`, `//`, `%`, `*`
6	`unary()`	`-` (negation), `not`, `!`, `++`, `--`
7 (highest)	`factor()`	literals, identifiers, calls, indexing, built-ins

Each level’s method parses one occurrence of the next higher level, then loops while the current token is an operator belonging to its own level, consuming the operator and parsing another higher-level operand, building a left-associative BinOpNode or LogicOpNode tree.

`parser.factor()` — Atomic Expressions

factor() handles the highest-precedence syntactic atoms. It recognizes:

Integer literals (INT) → NumberNode(int(value), "int")
Hexadecimal literals (HEX) → NumberNode(int(value, 16), "int")
Float literals (FLOAT) → NumberNode(float(value), "float")
String literals (STRING) → StringNode(value[1:-1], "str") (strips quotes)
Boolean keywords true / false → BoolNode(True) / BoolNode(False)
Built-in functions range(s, e), sqrt(v), rand_num(s, e), len(v), call[list, pos], int(v), str(v), float(v), bool(v) → the corresponding AST nodes
input (with optional string prompt) → InputNode
Identifiers — followed by optional chains of:
- [index] → IndexNode
- (args...) → CallNode
- .attr → AttributeNode
Hardware primitives — identifiers i2c, spi, or uart followed by .method(args) → HardwarePrimitiveNode
Parenthesized expressions (expr) — or a comma-separated list → TupleNode
List literals [...] → ListNode (via list_literal())
Dict literals {key: val, ...} → DictNode (via dict_literal())

If no pattern matches, factor() raises SyntaxError(f"Unexpected token {tok}").

Statement Dispatch

_statement() is the internal method that actually dispatches on the current token to produce an ASTNode. It handles two broad categories: Identifier-led statements — when the current token is IDENT, _statement() optimistically parses a special_expr() and then checks whether the next token is ASSIGN (=) or ASSIGN_OP (+=, -=, …). If it is, it produces an AssignNode, IndexAssignNode, AttributeAssignNode, or CompoundAssignNode. If the expression parse or assignment check fails with a SyntaxError, self.pos is reset to its saved value and the fallback path re-parses as a bare expression (call statement, etc.).

This limited backtrack — resetting self.pos on SyntaxError — is the only place the parser is not fully deterministic. It exists solely to distinguish target = value assignments from bare expression statements when the left-hand side begins with an identifier.

Keyword-led statements — when the current token is KEYWORD, _statement() matches on tok.value and calls the appropriate handler:

Keyword	Returns
`let`	`AssignNode(name, value, optional_type)`
`const`	`ConstAssignNode(name, value)`
`set`	`SetNode(name, num, type_, params)`
`print`	`PrintNode(expr)`
`if`	`IfNode(...)` via `if_stmt()`
`elif`	parsed inside `if_stmt()` → `ElifNode`
`else`	parsed inside `if_stmt()` → attached to `IfNode.else_body`
`while`	`WhileNode(condition, body)`
`for`	`ForNode(var_name, iterable, body)`
`def`	`FuncNode(name, params, body)`
`class`	`ClassNode(name, fields, body)`
`try`	`TryNode(try_body, except_nodes, else_body)`
`except`	parsed inside `try` handler → `BlockNode` list
`parallel`	`ParallelNode(body, threads)`
`import`	`ImportNode(name)` or `ImportAsNode(name, alias)`
`from`	`ImportFromNode(name, library)`
`return`	`ReturnNode(value)`
`break`	`BreakNode()`
`continue`	`ContinueNode()`
`pass`	`PassNode()`
`exec`	`ExecNode(string_literal)`
`py`	`PyNode(raw_python)` — consumes tokens until matching `}`

Any token that is neither IDENT nor one of the above keywords falls through to special_expr(), which allows bare expression statements (such as a standalone function call) to appear at the statement level.

Type Annotations

The parser recognizes optional type annotations on let and const declarations using a colon:

let count: int = 0
let name: str = "alice"

The type string is stored in AssignNode.type and later used by Interpreter.get_type() for type-mismatch detection at code-generation time. The parser itself does not enforce types — it only records them.

Architecture

Origin Parser: Recursive-Descent Grammar Reference

`Parser(tokens)`

Top-Level Entry Points

`parser.program() -> ProgramNode`

`parser.statement() -> ASTNode`

`parser.block() -> BlockNode`

Internal Token Helpers

`eat(type_) -> Token`

`skip_newlines()`

Expression Precedence

`parser.factor()` — Atomic Expressions

Statement Dispatch

Type Annotations

Build docs developers (and LLMs) love

Architecture

Documentation Index

​Parser(tokens)

​Top-Level Entry Points

​parser.program() -> ProgramNode

​parser.statement() -> ASTNode

​parser.block() -> BlockNode

​Internal Token Helpers

​eat(type_) -> Token

​skip_newlines()

​Expression Precedence

​parser.factor() — Atomic Expressions

​Statement Dispatch

​Type Annotations

Build docs developers (and LLMs) love

`Parser(tokens)`

Top-Level Entry Points

`parser.program() -> ProgramNode`

`parser.statement() -> ASTNode`

`parser.block() -> BlockNode`

Internal Token Helpers

`eat(type_) -> Token`

`skip_newlines()`

Expression Precedence

`parser.factor()` — Atomic Expressions

Statement Dispatch

Type Annotations