Threading Model

Filament uses a sophisticated multi-threaded architecture to maximize performance on modern CPUs. This guide explains the threading model and how to work with it effectively.

Overview

Filament’s threading model consists of:

Main Thread - User code, scene setup, API calls
Render Thread - Backend driver, command execution
Worker Threads - Parallel rendering tasks via JobSystem

Thread Safety

From Engine.h:

An Engine instance is not thread-safe. The implementation makes no attempt to synchronize calls to an Engine instance methods. If multi-threading is needed, synchronization must be external.

Important: All Engine API calls must come from the same thread, or be externally synchronized.

Engine Threading

When created, the Engine starts threads automatically:

Engine* engine = Engine::create();
// Creates:
// - 1 render thread (elevated priority)
// - N worker threads (platform-dependent)

Thread Priorities

Filament sets appropriate thread priorities based on the platform:

Render Thread - High priority for consistent frame delivery
Worker Threads - Balanced priority for parallel work

On ARM big.LITTLE architectures, Filament makes educated guesses about core assignment, trying to keep the render thread on a big core.

JobSystem

Filament uses utils::JobSystem for parallel workloads.

Architecture

From utils/include/utils/JobSystem.h:

class JobSystem {
public:
    class Job;
    using JobFunc = void(*)(void*, JobSystem&, Job*);
    
    // Create the job system
    explicit JobSystem(
        size_t threadCount = 0,          // 0 = auto-detect
        size_t adoptableThreadsCount = 1
    ) noexcept;
};

Job Structure

Jobs are 64-byte cache-aligned structures:

class alignas(CACHELINE_SIZE) Job {
private:
    void* storage[JOB_STORAGE_SIZE_WORDS];  // 48 bytes
    JobFunc function;                        // 8 bytes
    uint16_t parent;                         // 2 bytes
    mutable ThreadId id;                     // 1 byte
    mutable std::atomic<uint8_t> refCount;   // 1 byte
    std::atomic<uint32_t> runningJobCount;   // 4 bytes
};  // Total: 64 bytes

Creating Jobs

Multiple ways to create jobs:

1. Simple Job

JobSystem& js = engine->getJobSystem();

JobSystem::Job* job = js.createJob(nullptr, 
    [](JobSystem& js, JobSystem::Job* job) {
        // Do work
    }
);

js.run(job);

2. Method Pointer

struct Worker {
    void process(JobSystem& js, JobSystem::Job* job) {
        // Do work
    }
};

Worker worker;
JobSystem::Job* job = js.createJob<Worker, &Worker::process>(
    nullptr, &worker
);
js.run(job);

3. Utility Functions

For easier usage:

// Using jobs namespace utilities
auto* job = jobs::createJob(js, nullptr, 
    [capturedData]() {
        // Work with captured data
    }
);
js.run(job);

Job Hierarchies

Jobs can have parent-child relationships:

// Create root job
JobSystem::Job* root = js.createJob();

// Create child jobs
for (int i = 0; i < 10; i++) {
    auto* child = js.createJob(root, 
        [i](JobSystem& js, JobSystem::Job*) {
            // Process item i
        }
    );
    js.run(child);
}

// Wait for all children to complete
js.runAndWait(root);

Parallel For

Process arrays in parallel:

struct ProcessData {
    void operator()(int* data, size_t count) {
        for (size_t i = 0; i < count; i++) {
            data[i] *= 2;
        }
    }
};

int data[1000];
jobs::parallel_for(js, data, 1000, ProcessData{});

Work Stealing

JobSystem uses a work-stealing dequeue:

using WorkQueue = WorkStealingDequeue<uint16_t, MAX_JOB_COUNT>;

Idle threads steal work from busy threads to maintain load balance.

Render Thread Commands

Command Stream

The main thread records commands into a command stream, which the render thread executes:

// Main thread
renderer->render(view);  // Records commands
renderer->endFrame();    // Submits to render thread

// Render thread executes commands asynchronously

Command Buffer Configuration

From Engine::Config:

Engine::Config config;

// Size of command buffer arena
config.commandBufferSizeMB = 3;  // Default: 3 MiB

// Minimum size per command buffer
config.minCommandBufferSizeMB = 1;  // Default: 1 MiB

If commands don’t fit, the engine may stall waiting for space. This is logged as a warning.

Command Execution

From CommandStream.h:

class CommandBase {
public:
    // Execute command and return next
    CommandBase* execute(Driver& driver) {
        intptr_t next;
        mExecute(driver, this, &next);
        return reinterpret_cast<CommandBase*>(
            reinterpret_cast<intptr_t>(this) + next
        );
    }
};

Thread Adoption

You can make external threads part of the JobSystem:

JobSystem& js = engine->getJobSystem();

// In your thread
js.adopt();

// Now this thread can run jobs
auto* job = js.createJob(nullptr, 
    [](JobSystem&, JobSystem::Job*) { /* work */ }
);
js.run(job);

// When shutting down
js.emancipate();

Pausing the Render Thread

Experimental API:

// Pause rendering
engine->setPaused(true);

// Resume rendering  
engine->setPaused(false);

// Check state
bool paused = engine->isPaused();

Caveats:

Buffer callbacks won’t fire while paused
Commands keep queuing until buffer limit
Exceeding buffer limit causes abort

Message Queues

Handle callbacks on the main thread:

// In your main loop
while (!quit) {
    // Process callbacks
    engine->pumpMessageQueues();
    
    // Render frame
    if (renderer->beginFrame(swapChain)) {
        renderer->render(view);
        renderer->endFrame();
    }
}

This executes pending callbacks immediately, reducing latency.

Synchronization

flush() and flushAndWait()

// Flush commands (non-blocking)
engine->flush();

// Flush and wait for completion
engine->flushAndWait();

// Flush with timeout (returns false if timeout)
bool success = engine->flushAndWait(timeout_ns);

Use case - Android surface destruction:

@Override
public void surfaceDestroyed(SurfaceHolder holder) {
    // Ensure swap chain destruction completes
    engine.destroySwapChain(swapChain);
    engine.flushAndWait();
}

Fences

For GPU synchronization:

Fence* fence = engine->createFence();

// Submit fence to GPU
renderer->render(view);
renderer->endFrame();

// Wait for GPU
Fence::FenceStatus status = fence->wait(Fence::Mode::FLUSH);
if (status == Fence::FenceStatus::CONDITION_SATISFIED) {
    // GPU work completed
}

engine->destroy(fence);

JobSystem Configuration

Configure thread count:

Engine::Config config;

// Limit worker threads (default: 0 = auto)
config.jobSystemThreadCount = 4;

Engine* engine = Engine::Builder()
    .config(&config)
    .build();

Useful for:

CPU-constrained environments - Reduce thread contention
Power management - Limit CPU usage
Testing - Reproduce threading issues

Single-Threaded Platforms

For platforms without threading (e.g., some embedded systems):

// Call from main loop
engine->execute();

This invokes one iteration of the render loop.

Best Practices

Do’s

✓ Call Engine APIs from one thread (or use external locking)
✓ Use JobSystem for parallel work in rendering
✓ Call pumpMessageQueues() regularly for low latency
✓ Configure jobSystemThreadCount if needed
✓ Use flushAndWait() when destroying critical resources
✓ Create job hierarchies for complex parallel tasks

Don’ts

✗ Don’t call Engine APIs from multiple threads without locking
✗ Don’t create/destroy resources from callbacks without care
✗ Don’t rely on callbacks while paused
✗ Don’t exceed command buffer size
✗ Don’t use blocking operations in jobs
✗ Don’t forget to emancipate() adopted threads

Example: Parallel Scene Processing

#include <filament/Engine.h>
#include <utils/JobSystem.h>

using namespace filament;
using namespace utils;

class SceneProcessor {
public:
    SceneProcessor(Engine* engine) 
        : mEngine(engine)
        , mJobSystem(engine->getJobSystem()) {}
    
    void updateObjects(std::vector<Object>& objects) {
        // Create root job
        auto* root = mJobSystem.createJob();
        
        // Process objects in parallel
        const size_t batchSize = 100;
        for (size_t i = 0; i < objects.size(); i += batchSize) {
            size_t count = std::min(batchSize, 
                                   objects.size() - i);
            
            auto* job = mJobSystem.createJob(root,
                [&objects, i, count](JobSystem&, JobSystem::Job*) {
                    for (size_t j = 0; j < count; j++) {
                        objects[i + j].update();
                    }
                }
            );
            
            mJobSystem.run(job);
        }
        
        // Wait for completion
        mJobSystem.runAndWait(root);
    }
    
private:
    Engine* mEngine;
    JobSystem& mJobSystem;
};

Performance Monitoring

void monitorJobSystem(Engine* engine) {
    JobSystem& js = engine->getJobSystem();
    
    // JobSystem doesn't expose thread utilization directly,
    // but you can monitor frame times to detect issues
    
    auto frames = renderer->getFrameInfoHistory(60);
    for (const auto& frame : frames) {
        // Check if CPU is bottleneck
        auto cpuTime = frame.endFrame - frame.beginFrame;
        auto gpuTime = frame.gpuFrameDuration;
        
        if (cpuTime > gpuTime) {
            printf("CPU bottleneck: %lld ns\n", cpuTime);
        }
    }
}

Getting Started

Core Concepts

Platform Guides

Material System

Rendering Features

Asset Pipeline

Tools

Advanced Topics

​Threading Model

​Overview

​Thread Safety

​Engine Threading

​Thread Priorities

​JobSystem

​Architecture

​Job Structure

​Creating Jobs

​1. Simple Job

​2. Method Pointer

​3. Utility Functions

​Job Hierarchies

​Parallel For

​Work Stealing

​Render Thread Commands

​Command Stream

​Command Buffer Configuration

​Command Execution

​Thread Adoption

​Pausing the Render Thread

​Message Queues

​Synchronization

​flush() and flushAndWait()

​Fences

​JobSystem Configuration

​Single-Threaded Platforms

​Best Practices

​Do’s

​Don’ts

​Example: Parallel Scene Processing

​Performance Monitoring

​See Also

Build docs developers (and LLMs) love