Parallel Execution in Origin Using parallel Blocks

Origin’s parallel block lets you express concurrent computation without writing a single line of thread management code. Wrap a group of statements in parallel { } and Origin’s interpreter generates per-statement threads automatically — each statement becomes its own threading.Thread, all threads are started together, and execution resumes only after every thread has finished.

Syntax

parallel {
    let a: int = heavyComputation1()
    let b: int = heavyComputation2()
}

Every statement inside the parallel { } block runs in its own thread. All threads are started before any of them are joined, so independent work overlaps. Execution of the code that follows the block does not begin until every thread inside the block has completed.

Optional Thread Count

You can pass an explicit thread count with parallel(N):

parallel(4) {
    doWork()
}

When a thread count is provided, the block body is treated as a single unit and launched across N identical threads — each thread calls the same block body. When no count is given (the common case), each top-level statement in the block gets its own thread.

How It Works: Code Generation

When the Origin interpreter encounters a ParallelNode, it generates Python threading code directly during the transpilation step. No external engine is invoked at this stage — the threading setup is emitted as ordinary Python source. For a parallel { } block with no explicit count, the interpreter generates one named function and one thread per statement, then joins all threads:

import threading
_threads = []
def _parallel_stmt_0():
    a = heavyComputation1()
_t0 = threading.Thread(target=_parallel_stmt_0)
_t0.start(); _threads.append(_t0)
def _parallel_stmt_1():
    b = heavyComputation2()
_t1 = threading.Thread(target=_parallel_stmt_1)
_t1.start(); _threads.append(_t1)
for t in _threads: t.join()

For a parallel(N) block, a single _parallel_block() function is defined and started N times:

import threading
_threads = []
def _parallel_block():
    doWork()
for _ in range(4):
    t = threading.Thread(target=_parallel_block)
    t.start(); _threads.append(t)
for t in _threads: t.join()

The for t in _threads: t.join() line at the end is the thread barrier — it ensures the parallel block is fully complete before any following statement runs.

Complete Example

# Run four independent sensor reads concurrently,
# then aggregate once all reads are done.

def read_sensor(id) {
    return id * 42
}

let s1: int = 0
let s2: int = 0
let s3: int = 0
let s4: int = 0

parallel {
    s1 = read_sensor(1)
    s2 = read_sensor(2)
    s3 = read_sensor(3)
    s4 = read_sensor(4)
}

let total: int = s1 + s2 + s3 + s4
print total

All four read_sensor calls run in parallel threads. After the barrier, total is computed using all four results.

The `parallelInt` Engine

Origin ships a second, more advanced execution engine called parallelInt that performs wavefront dependency analysis on the full program AST before running anything. This engine is separate from the standard origin run pipeline — it is an alternative runner used for research and experimentation. The parallelInt engine analyses every statement in the whole program (not just a parallel block) and automatically groups them into numbered stages:

Hazard	Condition
RAW (read-after-write)	an earlier statement writes a variable that a later one reads
WAW (write-after-write)	two statements write the same variable
WAR (write-after-read)	an earlier statement reads a variable that a later one writes
SFIO (side-effect ordering)	both statements carry observable side effects

Statements with no dependency are placed in the same stage and run as concurrent threads. Stages execute sequentially — every thread in a stage joins before the next stage begins. The engine’s verbose output identifies each stage and its mode:

[parallelInt] Stage 0 [PARALLEL x4]
[parallelInt] Stage 1 [sequential]

The parallelInt API call signature is:

from parallelInt import parallelInt
result = parallelInt.gen(ast, shared_globals, verbose)

It accepts the full ProgramNode AST returned by parser.Parser, not a sub-tree. Both shared_globals and verbose are optional.

Thread Safety and Shared State

All threads within a parallel block share the same Python dict of globals, passed directly to Python’s exec(). This means:

Variables written inside the block are visible to statements that follow the block, because the join barrier guarantees all threads have finished before execution continues.
Two statements in the same parallel { } block should write to different variables. If two threads write the same variable concurrently, the result is a data race.
Statements with side effects — print, input, GPIO set, import, and nested exec — run in separate threads and their order relative to each other is not guaranteed.

The parallel {} block in the standard interpreter generates one thread per top-level statement. For automatic dependency-aware scheduling across an entire program, see the parallelInt engine described above, which is a separate execution path from the normal origin run command.

Mutating shared Python objects — such as lists or dicts — from multiple threads within the same parallel block can cause race conditions, because Python dict and list mutations are not individually atomic. Use separate variables per thread and merge the results after the block completes.

Get Started

Language Reference

Hardware

Advanced

Parallel Execution in Origin Using parallel Blocks

Syntax

Optional Thread Count

How It Works: Code Generation

Complete Example

The `parallelInt` Engine

Thread Safety and Shared State

Build docs developers (and LLMs) love

Get Started

Language Reference

Hardware

Advanced

Documentation Index

​Syntax

​Optional Thread Count

​How It Works: Code Generation

​Complete Example

​The parallelInt Engine

​Thread Safety and Shared State

Build docs developers (and LLMs) love

Syntax

Optional Thread Count

How It Works: Code Generation

Complete Example

The `parallelInt` Engine

Thread Safety and Shared State