Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/boblio-max/origin/llms.txt

Use this file to discover all available pages before exploring further.

The Origin parser (parser.py) transforms the flat list[Token] produced by lex() into a structured Abstract Syntax Tree (AST). It is a hand-written, single-pass recursive-descent parser with no external grammar tool dependency. Every syntactic form in Origin — from simple arithmetic expressions to class definitions and parallel {} blocks — is handled by a dedicated method on the Parser class. The resulting AST is a tree of node instances from classes.py, rooted at a ProgramNode, which the Interpreter then walks to emit Python source.

Parser(tokens)

Instantiate the parser by passing the token list returned by lex():
from lexer import lex
from parser import Parser

with open("main.or") as f:
    lines = [l.rstrip("\n") for l in f]

tokens = lex(lines)
parser = Parser(tokens)
ast    = parser.program()   # -> ProgramNode
The constructor stores tokens and sets self.pos = 0. The parser advances pos by consuming tokens through eat().

Top-Level Entry Points

parser.program() -> ProgramNode

Parses a complete Origin program. Calls statement() in a loop until the current token is EOF, then returns a ProgramNode whose .statements list contains every top-level ASTNode. This is the method you call to parse a full .or file.

parser.statement() -> ASTNode

Parses a single statement. It first calls skip_newlines() to discard leading blank lines, records the current token’s line number for error reporting, then delegates to the internal _statement() dispatcher. The line number is attached to the returned node via _set_line(node, line).

parser.block() -> BlockNode

Parses a braced block: { statement* }. Calls skip_newlines(), consumes the opening { bracket, then repeatedly calls statement() until the closing } is reached. Returns a BlockNode containing the collected statements.
# Example: parsing a block manually
block_node = parser.block()   # expects current token to be '{'
# block_node.statements -> list[ASTNode]

Internal Token Helpers

eat(type_) -> Token

Consumes and returns the current token when its type matches type_. Advances self.pos by one. Raises SyntaxError if the types don’t match:
SyntaxError(f"Expected {type_}, got {tok.type} ({tok.value}) at {tok.line}:{tok.col}")

skip_newlines()

Skips zero or more consecutive NEWLINE tokens. Origin is not whitespace-sensitive within blocks — newlines are optional statement separators and are stripped before each statement parse.

Expression Precedence

Expressions are parsed through a layered call chain. Each level can call the next-higher-precedence level, so the chain enforces standard operator precedence without an explicit precedence table. From lowest to highest:
LevelMethodOperators handled
1 (lowest)special_expr()??, ->, =>, <=>, ::
2logic()&&, ||, and, or
3comparison()===, !==, ==, !=, <, >, <=, >=, <>
4expr()+, -
5term()*, /, //, %, **
6unary()- (negation), not, !, ++, --
7 (highest)factor()literals, identifiers, calls, indexing, built-ins
Each level’s method parses one occurrence of the next higher level, then loops while the current token is an operator belonging to its own level, consuming the operator and parsing another higher-level operand, building a left-associative BinOpNode or LogicOpNode tree.

parser.factor() — Atomic Expressions

factor() handles the highest-precedence syntactic atoms. It recognizes:
  • Integer literals (INT) → NumberNode(int(value), "int")
  • Hexadecimal literals (HEX) → NumberNode(int(value, 16), "int")
  • Float literals (FLOAT) → NumberNode(float(value), "float")
  • String literals (STRING) → StringNode(value[1:-1], "str") (strips quotes)
  • Boolean keywords true / falseBoolNode(True) / BoolNode(False)
  • Built-in functions range(s, e), sqrt(v), rand_num(s, e), len(v), call[list, pos], int(v), str(v), float(v), bool(v) → the corresponding AST nodes
  • input (with optional string prompt) → InputNode
  • Identifiers — followed by optional chains of:
    • [index]IndexNode
    • (args...)CallNode
    • .attrAttributeNode
  • Hardware primitives — identifiers i2c, spi, or uart followed by .method(args)HardwarePrimitiveNode
  • Parenthesized expressions (expr) — or a comma-separated list → TupleNode
  • List literals [...]ListNode (via list_literal())
  • Dict literals {key: val, ...}DictNode (via dict_literal())
If no pattern matches, factor() raises SyntaxError(f"Unexpected token {tok}").

Statement Dispatch

_statement() is the internal method that actually dispatches on the current token to produce an ASTNode. It handles two broad categories: Identifier-led statements — when the current token is IDENT, _statement() optimistically parses a special_expr() and then checks whether the next token is ASSIGN (=) or ASSIGN_OP (+=, -=, …). If it is, it produces an AssignNode, IndexAssignNode, AttributeAssignNode, or CompoundAssignNode. If the expression parse or assignment check fails with a SyntaxError, self.pos is reset to its saved value and the fallback path re-parses as a bare expression (call statement, etc.).
This limited backtrack — resetting self.pos on SyntaxError — is the only place the parser is not fully deterministic. It exists solely to distinguish target = value assignments from bare expression statements when the left-hand side begins with an identifier.
Keyword-led statements — when the current token is KEYWORD, _statement() matches on tok.value and calls the appropriate handler:
KeywordReturns
letAssignNode(name, value, optional_type)
constConstAssignNode(name, value)
setSetNode(name, num, type_, params)
printPrintNode(expr)
ifIfNode(...) via if_stmt()
elifparsed inside if_stmt()ElifNode
elseparsed inside if_stmt() → attached to IfNode.else_body
whileWhileNode(condition, body)
forForNode(var_name, iterable, body)
defFuncNode(name, params, body)
classClassNode(name, fields, body)
tryTryNode(try_body, except_nodes, else_body)
exceptparsed inside try handler → BlockNode list
parallelParallelNode(body, threads)
importImportNode(name) or ImportAsNode(name, alias)
fromImportFromNode(name, library)
returnReturnNode(value)
breakBreakNode()
continueContinueNode()
passPassNode()
execExecNode(string_literal)
pyPyNode(raw_python) — consumes tokens until matching }
Any token that is neither IDENT nor one of the above keywords falls through to special_expr(), which allows bare expression statements (such as a standalone function call) to appear at the statement level.

Type Annotations

The parser recognizes optional type annotations on let and const declarations using a colon:
let count: int = 0
let name: str = "alice"
The type string is stored in AssignNode.type and later used by Interpreter.get_type() for type-mismatch detection at code-generation time. The parser itself does not enforce types — it only records them.

Build docs developers (and LLMs) love