Data Model

Overview

Slung’s data model is designed for high-cardinality time series data with flexible tagging and multiple value types. The model is optimized for both write throughput and query performance.

Core Concepts

Series Keys

A series key uniquely identifies a time series and is constructed from a measurement name and sorted tags:

<measurement>,<tag1>,<tag2>,...,<tagN>

Example:

cpu.usage,host=server-01,region=us-west,env=prod

Series keys are interned by the server - each unique key is stored once and reused across all data points, reducing memory overhead for high-cardinality datasets.

Series Key Internment

The server maintains a series key cache (src/main.zig:107-115):

pub fn internSeriesKey(self: *Server, key_bytes: []const u8) 
    !struct { key: []const u8, inserted: bool } 
{
    const entry = try self.series_key_cache.getOrPut(key_bytes);
    if (entry.found_existing) return .{ 
        .key = entry.value_ptr.*, 
        .inserted = false 
    };
    
    const owned = try self.allocator.dupe(u8, key_bytes);
    entry.key_ptr.* = owned;
    entry.value_ptr.* = owned;
    return .{ .key = owned, .inserted = true };
}

Hash-Based Collision Detection

To optimize lookups, Slung uses Wyhash for series key hashing with collision detection:

fn hashSeriesAndTags(series: []const u8, tags: []const []const u8) u64 {
    var hasher = std.hash.Wyhash.init(0);
    const len_buf: [8]u8 = std.mem.toBytes(@as(u64, @intCast(series.len)));
    hasher.update(&len_buf);
    hasher.update(series);
    for (tags) |tag| {
        const tag_len_buf: [8]u8 = std.mem.toBytes(@as(u64, @intCast(tag.len)));
        hasher.update(&tag_len_buf);
        hasher.update(tag);
    }
    return hasher.final();
}

When hash collisions are detected (main.zig:119-126), the server falls back to full string comparison.

Data Types

DataPoint Structure

A data point consists of a timestamp and a value:

pub const DataPoint = struct {
    timestamp: i64,  // Microsecond timestamp
    value: Value,
};

Value Types

Slung supports four value types (src/tsm/cache.zig:27-41):

pub const Value = union(enum) {
    Bool: bool,
    Int: i64,
    Float: f64,
    Bytes: []const u8,
    
    pub fn compare(self: Value, b: Value) std.math.Order {
        return switch (self) {
            .Int => |val| std.math.order(val, b.Int),
            .Float => |val| std.math.order(val, b.Float),
            .Bytes => |val| std.mem.order(u8, val, b.Bytes),
            .Bool => |val| std.math.order(
                @intFromBool(val), 
                @intFromBool(b.Bool)
            ),
        };
    }
};

All data points within a series must use the same value type. Mixing types will result in undefined behavior during deserialization.

Binary Wire Format

Data is ingested via WebSocket in little-endian binary format:

[timestamp:i64]         // 8 bytes: microsecond timestamp
[value:f64]             // 8 bytes: floating-point value
[series_len:u16]        // 2 bytes: length of series name
[tag_count:u16]         // 2 bytes: number of tags
[series:bytes]          // variable: series name
[tag_len:u16+bytes]...  // variable: tags (length-prefixed)

Decoding implementation (src/main.zig:424-469):

fn decodeBinaryMessage(
    allocator: std.mem.Allocator, 
    data: []const u8, 
    tag_scratch: *std.ArrayList([]const u8)
) !DecodedMessage {
    var offset: usize = 0;
    
    const timestamp_bits = std.mem.readInt(u64, data[offset..][0..8], .little);
    const timestamp: i64 = @bitCast(timestamp_bits);
    offset += 8;
    
    const value_bits = std.mem.readInt(u64, data[offset..][0..8], .little);
    const value: f64 = @bitCast(value_bits);
    offset += 8;
    
    const series_len = @as(usize, std.mem.readInt(u16, data[offset..][0..2], .little));
    offset += 2;
    const tag_count = @as(usize, std.mem.readInt(u16, data[offset..][0..2], .little));
    offset += 2;
    
    // ... series and tag parsing
}

Indexing Strategy

Three-Level Index

Slung maintains three complementary indexes for efficient query routing:

1. Series by Measurement

Maps measurement names to all associated series:

series_by_measurement: std.StringArrayHashMap(std.ArrayList([]const u8))

Example:

"cpu.usage" -> [
    "cpu.usage,host=server-01,region=us-west",
    "cpu.usage,host=server-02,region=us-east"
]

2. Series by Measurement+Tag

Maps measurement<0x1f>tag to matching series:

series_by_measurement_tag: std.StringArrayHashMap(std.ArrayList([]const u8))

Example:

"cpu.usage<0x1f>region=us-west" -> [
    "cpu.usage,host=server-01,region=us-west"
]

3. Series Key Hash

Fast lookup by hash (with collision tracking):

series_key_by_hash: std.AutoHashMap(u64, []const u8)
series_key_hash_collisions: std.AutoHashMap(u64, void)

Index Population

Indexes are populated lazily as series are written (src/main.zig:259-291):

fn indexSeriesKey(self: *Server, series_key: []const u8) !void {
    var parts = std.mem.splitScalar(u8, series_key, ',');
    const measurement = parts.next() orelse return error.InvalidSeriesKey;
    try self.indexAppendMeasurement(measurement, series_key);
    
    while (parts.next()) |tag| {
        const trimmed = std.mem.trim(u8, tag, " \t\r\n");
        if (trimmed.len == 0) continue;
        try self.indexAppendMeasurementTag(measurement, trimmed, series_key);
    }
}

Query Matching

Tag Filter Evaluation

The server evaluates tag filters using set operations (src/main.zig:140-209):

Start with measurement universe: All series for the measurement
Evaluate each tag operand: Build a set of matching series
Apply operators: Combine sets using AND, OR, NOT
Return intersection: Final set of matching series keys

pub fn matchingSeriesKeysForQuery(
    self: *Server, 
    allocator: std.mem.Allocator, 
    q: *const Query
) ![]const []const u8 {
    const universe = self.series_by_measurement.get(q.series) 
        orelse return try allocator.alloc([]const u8, 0);
    
    if (q.tagsSlice().len == 0) {
        return try allocator.dupe([]const u8, universe.items);
    }
    
    // ... set operations on tags
}

Example Query Evaluation

Query: AVG:cpu.usage:[region=us-west AND NOT env=dev]

Get universe: All cpu.usage series
Get region=us-west set from series_by_measurement_tag
Get env=dev set and invert (NOT)
Intersect both sets (AND)
Return matching series keys

Timestamps

Timestamp Format

All timestamps are stored as microsecond precision i64 values:

Range: ±292,277 years from Unix epoch
Resolution: 1 microsecond
Format: Signed 64-bit integer

Timestamp Encoding

See TSM Tree for details on Gorilla delta-of-delta encoding.

Memory Optimization

String Internment

Series keys are interned once:

Reduces memory usage for high-cardinality data
Single allocation per unique series key
Pointer equality checks for fast comparison

Tag Scratch Buffer

Tag arrays are reused via a scratch buffer to avoid allocations:

var tag_scratch: std.ArrayList([]const u8) = .empty;
const message = decodeBinaryMessage(allocator, data, &tag_scratch);

Best Practices

Tag Cardinality

Keep tag cardinality manageable. Each unique tag combination creates a new series. For example:

Good: host (100 values), region (5 values) = 500 series
Bad: request_id (millions of values) = millions of series

Tag Naming

Use consistent tag naming:

Use key=value format: env=prod, region=us-west
Sort tags alphabetically for consistent series keys
Avoid whitespace in tag names

Value Types

Choose appropriate value types:

Use Float for measurements (temperature, CPU usage)
Use Int for counts (request count, byte count)
Use Bool for binary states (online/offline)
Use Bytes sparingly (no compression support)

Next Steps

TSM Tree Storage

Learn how data is stored and compressed on disk

Get Started

Core Concepts

Ingestion

Querying

Workflows

Architecture

Overview

Core Concepts

Series Keys

Series Key Internment

Hash-Based Collision Detection

Data Types

DataPoint Structure

Value Types

Binary Wire Format

Indexing Strategy

Three-Level Index

1. Series by Measurement

2. Series by Measurement+Tag

3. Series Key Hash

Index Population

Query Matching

Tag Filter Evaluation

Example Query Evaluation

Timestamps

Timestamp Format

Timestamp Encoding

Memory Optimization

String Internment

Tag Scratch Buffer

Best Practices

Next Steps

TSM Tree Storage

Build docs developers (and LLMs) love

Get Started

Core Concepts

Ingestion

Querying

Workflows

Architecture

Documentation Index

​Overview

​Core Concepts

​Series Keys

​Series Key Internment

​Hash-Based Collision Detection

​Data Types

​DataPoint Structure

​Value Types

​Binary Wire Format

​Indexing Strategy

​Three-Level Index

​1. Series by Measurement

​2. Series by Measurement+Tag

​3. Series Key Hash

​Index Population

​Query Matching

​Tag Filter Evaluation

​Example Query Evaluation

​Timestamps

​Timestamp Format

​Timestamp Encoding

​Memory Optimization

​String Internment

​Tag Scratch Buffer

​Best Practices

​Next Steps

TSM Tree Storage

Build docs developers (and LLMs) love

Overview

Core Concepts

Series Keys

Series Key Internment

Hash-Based Collision Detection

Data Types

DataPoint Structure

Value Types

Binary Wire Format

Indexing Strategy

Three-Level Index

1. Series by Measurement

2. Series by Measurement+Tag

3. Series Key Hash

Index Population

Query Matching

Tag Filter Evaluation

Example Query Evaluation

Timestamps

Timestamp Format

Timestamp Encoding

Memory Optimization

String Internment

Tag Scratch Buffer

Best Practices

Next Steps