Graph is the central data structure in Meganeura. You describe your model by creating nodes and edges in a Graph, then pass the graph through the optimization and compilation pipeline. No computation happens while you build the graph — it is a pure description of operations and data dependencies.
Nodes, edges, and NodeId
Every operation in the graph produces a single output node.NodeId is a u32 that uniquely identifies a node within a graph:
g.matmul(a, b), the graph appends a new Node and returns its NodeId. You use that id as an input to subsequent operations:
inputs field references the NodeIds of the nodes it consumes.
Leaf nodes: Input, Parameter, and Constant
Three node types carry data into the graph without consuming other nodes.| Kind | Op variant | Purpose |
|---|---|---|
Input | Op::Input { name } | Runtime data uploaded each step (e.g. a batch of images) |
Parameter | Op::Parameter { name } | Trainable weights — persisted across steps |
Constant | Op::Constant { data } | Fixed values baked into the graph (e.g. scale factors) |
scalar is a convenience wrapper for a single-element constant:
TensorType and DType
Every node carries aTensorType that describes the shape and element type of its output tensor:
TensorType directly — builder methods infer the output type from their inputs. Utilities like num_elements() and size_bytes() let you inspect memory usage:
DType::F32. DType::U32 is used for integer inputs such as token IDs passed to embedding.
Op variants
Op is an exhaustive enum that covers every operation the compiler knows how to lower to a GPU shader. The variants fall into several families:
Matrix multiplication
Matrix multiplication
MatMul, MatMulAT (A transposed), MatMulBT (B transposed), and their fused counterparts FusedMatMulAdd, FusedMatMulATAdd, FusedMatMulBTAdd. The fused variants are produced by the optimizer — you never create them by hand.Element-wise binary
Element-wise binary
Add, Mul, BiasAdd (broadcasts a 1-D bias over a 2-D tensor), Greater.Element-wise unary
Element-wise unary
Relu, Sigmoid, Neg, Abs, Log, Recip, Silu, Gelu.Reductions
Reductions
SumAll and MeanAll reduce to a scalar [1] tensor. SumRows reduces [M, N] to [N]. Softmax and LogSoftmax operate row-wise.Loss functions
Loss functions
CrossEntropyLoss, BceLoss.Transformer ops
Transformer ops
RmsNorm, LayerNorm, SwiGLU, SwiGLUConcat, CausalAttention, FullAttention, CrossAttention, MultiHeadAttn, RoPE, Embedding.Backward / gradient ops
Backward / gradient ops
RmsNormGradW, RmsNormGradX, SwiGLUGradGate, SwiGLUGradUp, SiluGrad, MultiHeadAttnGradQ/K/V. These are inserted automatically by differentiate() — you do not create them yourself.Vision / CNN ops
Vision / CNN ops
Conv2d, GroupNorm, GroupNormSilu, Concat, SplitA, SplitB, Upsample2x.KV cache ops
KV cache ops
CacheWrite, CachedAttention. Inference-only; do not appear in training graphs.Building a 2-layer MLP graph
The following example constructs a complete 2-layer MLP forward graph with cross-entropy loss. The pattern — input, parameter, matmul, bias, activation, repeat — is the same for any depth:Shape assertions fire immediately during graph construction. If you pass incompatible shapes to
matmul, add, or any other op, the process panics with a descriptive message before any GPU work is attempted.set_outputs()
You must callset_outputs() before passing the graph to any pipeline function. The output list tells the compiler which nodes’ buffers must be readable on the host (e.g. the loss scalar) and drives the backward pass: differentiate() starts from outputs()[0] as the loss node.
toposort()
The optimizer may append new nodes at the end of the node list to satisfy fusions (for example, a merged weight parameter forSwiGLUConcat). The resulting graph is no longer in strict dependency order. Call toposort() before running autodiff to ensure that every node appears after all its inputs:
toposort() uses Kahn’s algorithm to produce a new graph with consecutive, dependency-ordered IDs, and it removes any Nop nodes left behind by fusion.
Next steps
E-graph optimization
How Meganeura fuses ops using equality saturation before compiling.
Automatic differentiation
How
differentiate() appends a backward pass to your forward graph.