Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/boblio-max/origin/llms.txt

Use this file to discover all available pages before exploring further.

Origin is a compiled-and-executed language implemented entirely in Python. A .or source file passes through four sequential stages before any user code runs: the lexer tokenizes raw text into a flat Token list, the parser consumes that list and builds an Abstract Syntax Tree (AST), the interpreter walks the AST and emits a Python source string, and finally Python’s built-in exec() runs that string inside a controlled globals dictionary. Each stage is a self-contained Python module with a clear public interface, making it straightforward to extend or embed any individual part of the pipeline.

The Four Stages

Stage 1 — Lexical analysis (lexer.py): The lex() function reads an iterable of source lines and returns a flat list[Token]. Each Token carries its type, original text value, 1-based line number, and 0-based column. Patterns are declared in TOKEN_REGEX as an ordered list of (regex, token_type) pairs and pre-compiled once at module load time. Whitespace and comments are consumed silently (they produce no tokens). The sequence always ends with an EOF token. Stage 2 — Recursive-descent parsing (parser.py): Parser(tokens).program() iterates over the token list, dispatches on token type and keyword value, and returns a ProgramNode whose .statements list contains every top-level AST node. Each syntactic form — from let assignments to parallel {} blocks — is handled by a dedicated parsing method. Expression precedence is enforced through a hand-written call chain: special_expr → logic → comparison → expr → term → unary → factor. Stage 3 — Code generation (interpreter.py): Interpreter().generate(ast) walks the AST and returns a single Python source string. Every node type maps to a specific Python code pattern: AssignNode becomes a simple assignment, FuncNode becomes a def, ClassNode becomes a class with an __init__, and so on. The interpreter also injects globals()['_origin_runtime_line'] = N markers into the emitted code so that runtime errors can be mapped back to their Origin source line. Stage 4 — Execution (runner.py): run_origin(file_path) orchestrates the full pipeline. After code generation, it builds a runtime_globals dictionary containing random, math, the hardware helpers (_execute_set_pin, _execute_i2c_read, _execute_i2c_write), and the _origin_runtime_line tracker, then calls exec(generated_python, runtime_globals). Any exception is caught, translated into a friendly message by errors.py, and displayed with file path and source line.

Pipeline Architecture

  ┌─────────────────────────────────────────────────────────────┐
  │                    .or source file                          │
  └────────────────────────────┬────────────────────────────────┘
                               │  list[str]  (lines)

  ┌─────────────────────────────────────────────────────────────┐
  │  lexer.lex()          TOKEN_REGEX (18 patterns, compiled)   │
  │                       → list[Token]  …  EOF                 │
  └────────────────────────────┬────────────────────────────────┘
                               │  list[Token]

  ┌─────────────────────────────────────────────────────────────┐
  │  parser.Parser(tokens).program()                            │
  │  Recursive descent: factor → unary → term → expr →         │
  │  comparison → logic → special_expr → statement → block      │
  │                       → ProgramNode (AST)                   │
  └────────────────────────────┬────────────────────────────────┘
                               │  ProgramNode

  ┌─────────────────────────────────────────────────────────────┐
  │  interpreter.Interpreter().generate(ast)                    │
  │  AST-node dispatch → Python source string                   │
  │  Injects _origin_runtime_line markers                       │
  └────────────────────────────┬────────────────────────────────┘
                               │  str  (Python source)

  ┌─────────────────────────────────────────────────────────────┐
  │  exec(generated_python, runtime_globals)                    │
  │  globals: random, math, _execute_set_pin,                   │
  │           _execute_i2c_read, _execute_i2c_write             │
  └─────────────────────────────────────────────────────────────┘

Optional Fifth Stage: Parallel Scheduler (parallelInt.py)

When a program contains parallel {} blocks — or when the alternative runner invokes parallelInt directly — a fifth stage sits between parsing and code generation. parallelInt.gen(ast) performs dependency analysis on the top-level ProgramNode.statements, classifying each statement’s read-set and write-set. It then groups statements into wavefront stages where every statement in a stage has no RAW (read-after-write), WAW (write-after-write), or WAR (write-after-read) dependency on any other statement in the same stage. Each stage is executed with a threading.Thread per independent statement; all threads in a stage must complete (.join()) before the next stage begins.

Experimental Bytecode Path (bComp.py)

An alternative compilation target exists in bComp.py but is not used by the default runner.py. The Compiler class walks the same AST produced by the parser and emits a flat list of integer opcodes into self.bytecode, with literal values stored separately in self.constants. The OpCode class enumerates 44 numeric opcodes (PUSH_CONST = 0x01 through FOR_ITER = 0x2C). A companion VM class executes the bytecode using a value stack and a call stack, implementing jump patching for if/while/for control flow and supporting break/continue via LOOP_START/LOOP_END markers.

Programmatic API

You can drive the full pipeline from Python without going through runner.py:
from lexer import lex
from parser import Parser
from interpreter import Interpreter

with open("main.or") as f:
    lines = [l.rstrip("\n") for l in f]

tokens = lex(lines)
ast = Parser(tokens).program()
py_source = Interpreter().generate(ast)
exec(py_source)
To inspect intermediate representations, examine tokens (a list[Token]) or ast (a ProgramNode) before passing them to the next stage. To add custom built-ins, populate the globals dictionary passed to exec().

Lexer

Token types, TOKEN_REGEX, lex(), and Token fields

Parser

Recursive-descent grammar, expression precedence, and all statement forms

Interpreter

AST-to-Python code generation, generate(), and runtime helpers

AST Nodes

Complete reference for every node class in classes.py

Build docs developers (and LLMs) love