NodeIds and provides a forward() method that appends operations to the Graph. These are thin wrappers over the low-level graph API — no trait hierarchy, no dynamic dispatch.
nn::Linear
Fully connected linear layer: y = x @ weight + bias.
Fields
Weight parameter of shape
[in_features, out_features].Optional bias parameter of shape
[out_features]. None when constructed with no_bias.Linear::new
Creates a linear layer with a bias term.
The computation graph to register parameters into.
Name prefix for the parameters. Registers
{name}.weight and {name}.bias.Number of input features.
Number of output features.
Linear::no_bias
Creates a linear layer without a bias term.
The computation graph to register parameters into.
Name prefix. Registers only
{name}.weight.Number of input features.
Number of output features.
forward
Appends a matrix multiply and optional bias add to the graph.
The computation graph to append ops to.
Input tensor of shape
[batch, in_features].Output tensor of shape
[batch, out_features].nn::Embedding
Token embedding lookup table. Maps integer token indices to dense vectors.
Fields
Embedding table parameter of shape
[vocab_size, embed_dim].Embedding::new
The computation graph to register parameters into.
Name for the embedding weight parameter.
Number of tokens in the vocabulary.
Dimensionality of each token embedding.
forward
The computation graph to append ops to.
1D U32 tensor of shape
[seq_len] containing token indices.Output tensor of shape
[seq_len, embed_dim].nn::SwiGluFfn
SwiGLU feed-forward network: silu(gate(x)) * up(x) then down-projected back to the hidden dimension.
Internally uses three bias-free linear projections registered as {name}.gate_proj, {name}.up_proj, and {name}.down_proj.
Fields
Gate projection:
hidden → intermediate.Up projection:
hidden → intermediate.Down projection:
intermediate → hidden.SwiGluFfn::new
The computation graph to register parameters into.
Name prefix. Registers
{name}.gate_proj, {name}.up_proj, {name}.down_proj.Hidden (input and output) dimension.
Intermediate (expanded) dimension.
forward
The computation graph to append ops to.
Input tensor of shape
[seq, hidden].Output tensor of shape
[seq, hidden].nn::Mlp
Standard two-layer MLP: fc2(activation(fc1(x))).
Fields
First linear layer:
in_dim → hidden_dim (with bias).Second linear layer:
hidden_dim → out_dim (with bias).Activation function applied between the two layers.
Activation
Mlp::new
The computation graph to register parameters into.
Name prefix. Registers
{name}.fc1.weight, {name}.fc1.bias, {name}.fc2.weight, {name}.fc2.bias.Input feature dimension.
Hidden layer dimension.
Output feature dimension.
Activation function to use between layers.
forward
The computation graph to append ops to.
Input tensor of shape
[batch, in_dim].Output tensor of shape
[batch, out_dim].nn::Conv2d
2D convolution layer: y = conv2d(x, weight) + bias. Input and output tensors are flat 1D arrays in NCHW layout.
Fields
Kernel parameter of shape
[out_channels * in_channels * kernel_h * kernel_w] (flat 1D).Optional bias. Currently
None after construction — set manually if needed.Number of input channels.
Input spatial height.
Input spatial width.
Number of output channels.
Kernel height (equal to
kernel_size passed to new).Kernel width (equal to
kernel_size passed to new).Convolution stride.
Zero-padding added to each spatial edge.
Conv2d::new
The computation graph to register parameters into.
Name prefix. Registers
{name}.weight as a flat 1D parameter.Number of input channels.
Number of output channels.
Square kernel side length (sets both
kernel_h and kernel_w).Input spatial height.
Input spatial width.
Convolution stride.
Zero-padding on each edge.
forward
The computation graph to append ops to.
Flat input tensor of shape
[N * in_channels * in_h * in_w] in NCHW order.Batch size
N.Flat output tensor of shape
[N * out_channels * out_h * out_w] in NCHW order, where out_h = (in_h + 2*padding - kernel_h) / stride + 1.All tensors for
Conv2d are stored as flat 1D arrays. Spatial metadata (channels, height, width, kernel dimensions, stride, padding) is encoded in the op and used by the GPU kernel. There is no explicit reshape needed.nn::TransformerBlock
A single transformer decoder block combining pre-norm attention, a residual connection, and a SwiGLU feed-forward network with a second residual connection.
Forward pass:
RMS normalization applied before attention. Parameter name:
{name}.input_layernorm.weight.Causal self-attention module. Parameter names prefixed with
{name}.self_attn.RMS normalization applied before the feed-forward network. Parameter name:
{name}.post_attention_layernorm.weight.SwiGLU feed-forward network. Parameter names prefixed with
{name}.mlp.TransformerBlockConfig
Hidden dimension (model width).
Intermediate dimension for the SwiGLU FFN.
Key/value projection dimension (
num_kv_heads * head_dim).Number of query attention heads.
Number of key/value heads (for grouped-query attention).
Dimension per attention head.
Epsilon for RMS normalization numerical stability.
Base frequency for rotary position embeddings.
TransformerBlock::new
The computation graph to register parameters into.
Name prefix, typically
"model.layers.{i}".Block configuration.
forward
The computation graph to append ops to.
Input tensor of shape
[seq, hidden].Output tensor of shape
[seq, hidden].