Skip to main content

Threading Model

Filament uses a sophisticated multi-threaded architecture to maximize performance on modern CPUs. This guide explains the threading model and how to work with it effectively.

Overview

Filament’s threading model consists of:
  1. Main Thread - User code, scene setup, API calls
  2. Render Thread - Backend driver, command execution
  3. Worker Threads - Parallel rendering tasks via JobSystem

Thread Safety

From Engine.h:
An Engine instance is not thread-safe. The implementation makes no attempt to synchronize calls to an Engine instance methods. If multi-threading is needed, synchronization must be external.
Important: All Engine API calls must come from the same thread, or be externally synchronized.

Engine Threading

When created, the Engine starts threads automatically:
Engine* engine = Engine::create();
// Creates:
// - 1 render thread (elevated priority)
// - N worker threads (platform-dependent)

Thread Priorities

Filament sets appropriate thread priorities based on the platform:
  • Render Thread - High priority for consistent frame delivery
  • Worker Threads - Balanced priority for parallel work
On ARM big.LITTLE architectures, Filament makes educated guesses about core assignment, trying to keep the render thread on a big core.

JobSystem

Filament uses utils::JobSystem for parallel workloads.

Architecture

From utils/include/utils/JobSystem.h:
class JobSystem {
public:
    class Job;
    using JobFunc = void(*)(void*, JobSystem&, Job*);
    
    // Create the job system
    explicit JobSystem(
        size_t threadCount = 0,          // 0 = auto-detect
        size_t adoptableThreadsCount = 1
    ) noexcept;
};

Job Structure

Jobs are 64-byte cache-aligned structures:
class alignas(CACHELINE_SIZE) Job {
private:
    void* storage[JOB_STORAGE_SIZE_WORDS];  // 48 bytes
    JobFunc function;                        // 8 bytes
    uint16_t parent;                         // 2 bytes
    mutable ThreadId id;                     // 1 byte
    mutable std::atomic<uint8_t> refCount;   // 1 byte
    std::atomic<uint32_t> runningJobCount;   // 4 bytes
};  // Total: 64 bytes

Creating Jobs

Multiple ways to create jobs:

1. Simple Job

JobSystem& js = engine->getJobSystem();

JobSystem::Job* job = js.createJob(nullptr, 
    [](JobSystem& js, JobSystem::Job* job) {
        // Do work
    }
);

js.run(job);

2. Method Pointer

struct Worker {
    void process(JobSystem& js, JobSystem::Job* job) {
        // Do work
    }
};

Worker worker;
JobSystem::Job* job = js.createJob<Worker, &Worker::process>(
    nullptr, &worker
);
js.run(job);

3. Utility Functions

For easier usage:
// Using jobs namespace utilities
auto* job = jobs::createJob(js, nullptr, 
    [capturedData]() {
        // Work with captured data
    }
);
js.run(job);

Job Hierarchies

Jobs can have parent-child relationships:
// Create root job
JobSystem::Job* root = js.createJob();

// Create child jobs
for (int i = 0; i < 10; i++) {
    auto* child = js.createJob(root, 
        [i](JobSystem& js, JobSystem::Job*) {
            // Process item i
        }
    );
    js.run(child);
}

// Wait for all children to complete
js.runAndWait(root);

Parallel For

Process arrays in parallel:
struct ProcessData {
    void operator()(int* data, size_t count) {
        for (size_t i = 0; i < count; i++) {
            data[i] *= 2;
        }
    }
};

int data[1000];
jobs::parallel_for(js, data, 1000, ProcessData{});

Work Stealing

JobSystem uses a work-stealing dequeue:
using WorkQueue = WorkStealingDequeue<uint16_t, MAX_JOB_COUNT>;
Idle threads steal work from busy threads to maintain load balance.

Render Thread Commands

Command Stream

The main thread records commands into a command stream, which the render thread executes:
// Main thread
renderer->render(view);  // Records commands
renderer->endFrame();    // Submits to render thread

// Render thread executes commands asynchronously

Command Buffer Configuration

From Engine::Config:
Engine::Config config;

// Size of command buffer arena
config.commandBufferSizeMB = 3;  // Default: 3 MiB

// Minimum size per command buffer
config.minCommandBufferSizeMB = 1;  // Default: 1 MiB
If commands don’t fit, the engine may stall waiting for space. This is logged as a warning.

Command Execution

From CommandStream.h:
class CommandBase {
public:
    // Execute command and return next
    CommandBase* execute(Driver& driver) {
        intptr_t next;
        mExecute(driver, this, &next);
        return reinterpret_cast<CommandBase*>(
            reinterpret_cast<intptr_t>(this) + next
        );
    }
};

Thread Adoption

You can make external threads part of the JobSystem:
JobSystem& js = engine->getJobSystem();

// In your thread
js.adopt();

// Now this thread can run jobs
auto* job = js.createJob(nullptr, 
    [](JobSystem&, JobSystem::Job*) { /* work */ }
);
js.run(job);

// When shutting down
js.emancipate();

Pausing the Render Thread

Experimental API:
// Pause rendering
engine->setPaused(true);

// Resume rendering  
engine->setPaused(false);

// Check state
bool paused = engine->isPaused();
Caveats:
  • Buffer callbacks won’t fire while paused
  • Commands keep queuing until buffer limit
  • Exceeding buffer limit causes abort

Message Queues

Handle callbacks on the main thread:
// In your main loop
while (!quit) {
    // Process callbacks
    engine->pumpMessageQueues();
    
    // Render frame
    if (renderer->beginFrame(swapChain)) {
        renderer->render(view);
        renderer->endFrame();
    }
}
This executes pending callbacks immediately, reducing latency.

Synchronization

flush() and flushAndWait()

// Flush commands (non-blocking)
engine->flush();

// Flush and wait for completion
engine->flushAndWait();

// Flush with timeout (returns false if timeout)
bool success = engine->flushAndWait(timeout_ns);
Use case - Android surface destruction:
@Override
public void surfaceDestroyed(SurfaceHolder holder) {
    // Ensure swap chain destruction completes
    engine.destroySwapChain(swapChain);
    engine.flushAndWait();
}

Fences

For GPU synchronization:
Fence* fence = engine->createFence();

// Submit fence to GPU
renderer->render(view);
renderer->endFrame();

// Wait for GPU
Fence::FenceStatus status = fence->wait(Fence::Mode::FLUSH);
if (status == Fence::FenceStatus::CONDITION_SATISFIED) {
    // GPU work completed
}

engine->destroy(fence);

JobSystem Configuration

Configure thread count:
Engine::Config config;

// Limit worker threads (default: 0 = auto)
config.jobSystemThreadCount = 4;

Engine* engine = Engine::Builder()
    .config(&config)
    .build();
Useful for:
  • CPU-constrained environments - Reduce thread contention
  • Power management - Limit CPU usage
  • Testing - Reproduce threading issues

Single-Threaded Platforms

For platforms without threading (e.g., some embedded systems):
// Call from main loop
engine->execute();
This invokes one iteration of the render loop.

Best Practices

Do’s

Call Engine APIs from one thread (or use external locking)
Use JobSystem for parallel work in rendering
Call pumpMessageQueues() regularly for low latency
Configure jobSystemThreadCount if needed
Use flushAndWait() when destroying critical resources
Create job hierarchies for complex parallel tasks

Don’ts

Don’t call Engine APIs from multiple threads without locking
Don’t create/destroy resources from callbacks without care
Don’t rely on callbacks while paused
Don’t exceed command buffer size
Don’t use blocking operations in jobs
Don’t forget to emancipate() adopted threads

Example: Parallel Scene Processing

#include <filament/Engine.h>
#include <utils/JobSystem.h>

using namespace filament;
using namespace utils;

class SceneProcessor {
public:
    SceneProcessor(Engine* engine) 
        : mEngine(engine)
        , mJobSystem(engine->getJobSystem()) {}
    
    void updateObjects(std::vector<Object>& objects) {
        // Create root job
        auto* root = mJobSystem.createJob();
        
        // Process objects in parallel
        const size_t batchSize = 100;
        for (size_t i = 0; i < objects.size(); i += batchSize) {
            size_t count = std::min(batchSize, 
                                   objects.size() - i);
            
            auto* job = mJobSystem.createJob(root,
                [&objects, i, count](JobSystem&, JobSystem::Job*) {
                    for (size_t j = 0; j < count; j++) {
                        objects[i + j].update();
                    }
                }
            );
            
            mJobSystem.run(job);
        }
        
        // Wait for completion
        mJobSystem.runAndWait(root);
    }
    
private:
    Engine* mEngine;
    JobSystem& mJobSystem;
};

Performance Monitoring

void monitorJobSystem(Engine* engine) {
    JobSystem& js = engine->getJobSystem();
    
    // JobSystem doesn't expose thread utilization directly,
    // but you can monitor frame times to detect issues
    
    auto frames = renderer->getFrameInfoHistory(60);
    for (const auto& frame : frames) {
        // Check if CPU is bottleneck
        auto cpuTime = frame.endFrame - frame.beginFrame;
        auto gpuTime = frame.gpuFrameDuration;
        
        if (cpuTime > gpuTime) {
            printf("CPU bottleneck: %lld ns\n", cpuTime);
        }
    }
}

See Also

Build docs developers (and LLMs) love