Skip to main content

Overview

Slung’s data model is designed for high-cardinality time series data with flexible tagging and multiple value types. The model is optimized for both write throughput and query performance.

Core Concepts

Series Keys

A series key uniquely identifies a time series and is constructed from a measurement name and sorted tags:
<measurement>,<tag1>,<tag2>,...,<tagN>
Example:
cpu.usage,host=server-01,region=us-west,env=prod
Series keys are interned by the server - each unique key is stored once and reused across all data points, reducing memory overhead for high-cardinality datasets.

Series Key Internment

The server maintains a series key cache (src/main.zig:107-115):
pub fn internSeriesKey(self: *Server, key_bytes: []const u8) 
    !struct { key: []const u8, inserted: bool } 
{
    const entry = try self.series_key_cache.getOrPut(key_bytes);
    if (entry.found_existing) return .{ 
        .key = entry.value_ptr.*, 
        .inserted = false 
    };
    
    const owned = try self.allocator.dupe(u8, key_bytes);
    entry.key_ptr.* = owned;
    entry.value_ptr.* = owned;
    return .{ .key = owned, .inserted = true };
}

Hash-Based Collision Detection

To optimize lookups, Slung uses Wyhash for series key hashing with collision detection:
fn hashSeriesAndTags(series: []const u8, tags: []const []const u8) u64 {
    var hasher = std.hash.Wyhash.init(0);
    const len_buf: [8]u8 = std.mem.toBytes(@as(u64, @intCast(series.len)));
    hasher.update(&len_buf);
    hasher.update(series);
    for (tags) |tag| {
        const tag_len_buf: [8]u8 = std.mem.toBytes(@as(u64, @intCast(tag.len)));
        hasher.update(&tag_len_buf);
        hasher.update(tag);
    }
    return hasher.final();
}
When hash collisions are detected (main.zig:119-126), the server falls back to full string comparison.

Data Types

DataPoint Structure

A data point consists of a timestamp and a value:
pub const DataPoint = struct {
    timestamp: i64,  // Microsecond timestamp
    value: Value,
};

Value Types

Slung supports four value types (src/tsm/cache.zig:27-41):
pub const Value = union(enum) {
    Bool: bool,
    Int: i64,
    Float: f64,
    Bytes: []const u8,
    
    pub fn compare(self: Value, b: Value) std.math.Order {
        return switch (self) {
            .Int => |val| std.math.order(val, b.Int),
            .Float => |val| std.math.order(val, b.Float),
            .Bytes => |val| std.mem.order(u8, val, b.Bytes),
            .Bool => |val| std.math.order(
                @intFromBool(val), 
                @intFromBool(b.Bool)
            ),
        };
    }
};
All data points within a series must use the same value type. Mixing types will result in undefined behavior during deserialization.

Binary Wire Format

Data is ingested via WebSocket in little-endian binary format:
[timestamp:i64]         // 8 bytes: microsecond timestamp
[value:f64]             // 8 bytes: floating-point value
[series_len:u16]        // 2 bytes: length of series name
[tag_count:u16]         // 2 bytes: number of tags
[series:bytes]          // variable: series name
[tag_len:u16+bytes]...  // variable: tags (length-prefixed)
Decoding implementation (src/main.zig:424-469):
fn decodeBinaryMessage(
    allocator: std.mem.Allocator, 
    data: []const u8, 
    tag_scratch: *std.ArrayList([]const u8)
) !DecodedMessage {
    var offset: usize = 0;
    
    const timestamp_bits = std.mem.readInt(u64, data[offset..][0..8], .little);
    const timestamp: i64 = @bitCast(timestamp_bits);
    offset += 8;
    
    const value_bits = std.mem.readInt(u64, data[offset..][0..8], .little);
    const value: f64 = @bitCast(value_bits);
    offset += 8;
    
    const series_len = @as(usize, std.mem.readInt(u16, data[offset..][0..2], .little));
    offset += 2;
    const tag_count = @as(usize, std.mem.readInt(u16, data[offset..][0..2], .little));
    offset += 2;
    
    // ... series and tag parsing
}

Indexing Strategy

Three-Level Index

Slung maintains three complementary indexes for efficient query routing:

1. Series by Measurement

Maps measurement names to all associated series:
series_by_measurement: std.StringArrayHashMap(std.ArrayList([]const u8))
Example:
"cpu.usage" -> [
    "cpu.usage,host=server-01,region=us-west",
    "cpu.usage,host=server-02,region=us-east"
]

2. Series by Measurement+Tag

Maps measurement<0x1f>tag to matching series:
series_by_measurement_tag: std.StringArrayHashMap(std.ArrayList([]const u8))
Example:
"cpu.usage<0x1f>region=us-west" -> [
    "cpu.usage,host=server-01,region=us-west"
]

3. Series Key Hash

Fast lookup by hash (with collision tracking):
series_key_by_hash: std.AutoHashMap(u64, []const u8)
series_key_hash_collisions: std.AutoHashMap(u64, void)

Index Population

Indexes are populated lazily as series are written (src/main.zig:259-291):
fn indexSeriesKey(self: *Server, series_key: []const u8) !void {
    var parts = std.mem.splitScalar(u8, series_key, ',');
    const measurement = parts.next() orelse return error.InvalidSeriesKey;
    try self.indexAppendMeasurement(measurement, series_key);
    
    while (parts.next()) |tag| {
        const trimmed = std.mem.trim(u8, tag, " \t\r\n");
        if (trimmed.len == 0) continue;
        try self.indexAppendMeasurementTag(measurement, trimmed, series_key);
    }
}

Query Matching

Tag Filter Evaluation

The server evaluates tag filters using set operations (src/main.zig:140-209):
  1. Start with measurement universe: All series for the measurement
  2. Evaluate each tag operand: Build a set of matching series
  3. Apply operators: Combine sets using AND, OR, NOT
  4. Return intersection: Final set of matching series keys
pub fn matchingSeriesKeysForQuery(
    self: *Server, 
    allocator: std.mem.Allocator, 
    q: *const Query
) ![]const []const u8 {
    const universe = self.series_by_measurement.get(q.series) 
        orelse return try allocator.alloc([]const u8, 0);
    
    if (q.tagsSlice().len == 0) {
        return try allocator.dupe([]const u8, universe.items);
    }
    
    // ... set operations on tags
}

Example Query Evaluation

Query: AVG:cpu.usage:[region=us-west AND NOT env=dev]
  1. Get universe: All cpu.usage series
  2. Get region=us-west set from series_by_measurement_tag
  3. Get env=dev set and invert (NOT)
  4. Intersect both sets (AND)
  5. Return matching series keys

Timestamps

Timestamp Format

All timestamps are stored as microsecond precision i64 values:
  • Range: ±292,277 years from Unix epoch
  • Resolution: 1 microsecond
  • Format: Signed 64-bit integer

Timestamp Encoding

See TSM Tree for details on Gorilla delta-of-delta encoding.

Memory Optimization

String Internment

Series keys are interned once:
  • Reduces memory usage for high-cardinality data
  • Single allocation per unique series key
  • Pointer equality checks for fast comparison

Tag Scratch Buffer

Tag arrays are reused via a scratch buffer to avoid allocations:
var tag_scratch: std.ArrayList([]const u8) = .empty;
const message = decodeBinaryMessage(allocator, data, &tag_scratch);

Best Practices

Keep tag cardinality manageable. Each unique tag combination creates a new series. For example:
  • Good: host (100 values), region (5 values) = 500 series
  • Bad: request_id (millions of values) = millions of series
Use consistent tag naming:
  • Use key=value format: env=prod, region=us-west
  • Sort tags alphabetically for consistent series keys
  • Avoid whitespace in tag names
Choose appropriate value types:
  • Use Float for measurements (temperature, CPU usage)
  • Use Int for counts (request count, byte count)
  • Use Bool for binary states (online/offline)
  • Use Bytes sparingly (no compression support)

Next Steps

TSM Tree Storage

Learn how data is stored and compressed on disk

Build docs developers (and LLMs) love