Skip to main content

Performance Optimization

This guide covers performance optimization techniques for Filament applications, from engine configuration to rendering best practices.

Engine Configuration

Memory Configuration

Filament’s memory footprint is configurable via Engine::Config:
Engine::Config config;

// Command buffer size (default: 3 MiB)
config.commandBufferSizeMB = 3;

// Per-render-pass arena (default: 3 MiB)
config.perRenderPassArenaSizeMB = 3;

// Per-frame commands (default: 2 MiB)
config.perFrameCommandsSizeMB = 2;

// Minimum command buffer size (default: 1 MiB)
config.minCommandBufferSizeMB = 1;

Engine* engine = Engine::Builder()
    .config(&config)
    .build();
Memory Layout:
.perRenderPassArenaSizeMB (default: 3 MiB)
+--------------------------+
|                          |
| .perFrameCommandsSizeMB  |
|    (default 2 MiB)       |
|                          |
+--------------------------+
|  (froxel, etc...)        |
+--------------------------+

.commandBufferSizeMB (default 3MiB)
+--------------------------+
| .minCommandBufferSizeMB  |
+--------------------------+
| .minCommandBufferSizeMB  |
+--------------------------+
| .minCommandBufferSizeMB  |
+--------------------------+
If these buffers are too small, the program may abort (debug builds) or have undefined behavior. Monitor logs for allocation warnings.

JobSystem Configuration

Engine::Config config;
// Default: 0 (auto-detect)
config.jobSystemThreadCount = 4;  // Limit to 4 threads

Engine* engine = Engine::Builder()
    .config(&config)
    .build();
Setting jobSystemThreadCount can help in CPU-constrained environments where too many threads cause contention.

Rendering Optimizations

Frame Pacing

Configure display information for proper frame pacing:
Renderer::DisplayInfo displayInfo;
displayInfo.refreshRate = 60.0f;  // Display refresh rate in Hz

renderer->setDisplayInfo(displayInfo);
Set frame rate options:
Renderer::FrameRateOptions options;
options.interval = 1;           // 60fps (1x refresh rate)
options.headRoomRatio = 0.0f;   // Additional GPU headroom
options.scaleRate = 1.0f / 8.0f;// Dynamic resolution scale rate
options.history = 15;           // History size for filtering

renderer->setFrameRateOptions(options);

Dynamic Resolution Scaling

Use dynamic resolution to maintain target frame rates:
View::DynamicResolutionOptions options;
options.enabled = true;
options.minScale = 0.5f;        // Scale down to 50%
options.maxScale = 1.0f;        // Up to full resolution
options.quality = View::QualityLevel::MEDIUM;

view->setDynamicResolutionOptions(options);

Automatic Instancing

Enable automatic instancing for identical primitives:
engine->setAutomaticInstancingEnabled(true);
Instancing reduces CPU overhead by batching identical geometry with different transforms. Maximum instances:
size_t maxInstances = engine->getMaxAutomaticInstances();

Material Optimization

Material Variants

Filament generates shader variants based on:
  • Directional lighting
  • Dynamic lighting
  • Shadow receivers
  • Skinning
  • Fog
Minimize variants by:
  1. Disabling unused features in materials
  2. Using static variants when possible
  3. Sharing material instances across renderables

Material Batching

The engine batches material instances using a shared uniform buffer:
Engine::Config config;
config.sharedUboInitialSizeInBytes = 256 * 64;  // Initial size
Larger values avoid runtime reallocations but increase memory usage.

Geometry Optimization

Vertex Buffers

Use appropriate vertex formats:
VertexBuffer::Builder builder;
builder
    .vertexCount(vertexCount)
    .bufferCount(1)
    // Use 16-bit types when range allows
    .attribute(VertexAttribute::POSITION, 0, 
               VertexBuffer::AttributeType::HALF4)
    .attribute(VertexAttribute::UV0, 0,
               VertexBuffer::AttributeType::HALF2)
    .normalized(VertexAttribute::COLOR)
    .build(*engine);
Attribute Type Sizes:
TypeSizeUse Case
HALF24 bytesUVs, compressed normals
HALF48 bytesPositions (small range)
FLOAT28 bytesUVs (high precision)
FLOAT312 bytesPositions, normals
FLOAT416 bytesFull precision

Index Buffers

Use 16-bit indices when possible:
IndexBuffer::Builder builder;
builder
    .indexCount(indexCount)
    .bufferType(indexCount < 65536 ? 
        IndexBuffer::IndexType::USHORT :
        IndexBuffer::IndexType::UINT)
    .build(*engine);

Culling

Provide accurate bounding boxes for frustum culling:
RenderableManager::Builder(1)
    .boundingBox({{ -10, -10, -10 }, { 10, 10, 10 }})
    .geometry(0, primitiveType, vb, ib)
    .build(*engine, entity);

Texture Optimization

Texture Formats

Use compressed formats:
// For color textures
Texture::InternalFormat::RGB8;      // Uncompressed: 3 bytes/pixel
Texture::InternalFormat::ETC2_RGB8; // Compressed: ~0.5 bytes/pixel

// For normal maps
Texture::InternalFormat::RG8;       // 2-channel is sufficient

Mipmaps

Always generate mipmaps for textures:
Texture* texture = Texture::Builder()
    .width(width)
    .height(height)
    .levels(levels)  // = floor(log2(max(width, height))) + 1
    .format(Texture::InternalFormat::RGBA8)
    .build(*engine);
Use mipgen tool to generate mipmaps offline:
mipgen --compression=etc2_rgba input.png output.ktx

Texture Streaming

For large textures, use asynchronous uploads:
Texture::PixelBufferDescriptor buffer(data, size,
    Texture::Format::RGBA, Texture::Type::UBYTE,
    [](void* buffer, size_t, void* user) {
        // Callback when upload completes
        free(buffer);
    });

texture->setImage(*engine, level, std::move(buffer));

Scene Optimization

Reduce Draw Calls

  1. Batch static geometry into fewer renderables
  2. Use instancing for repeated objects
  3. Minimize material switches by sorting renderables
  4. Use texture arrays instead of texture atlases

Light Count

Limit visible lights:
// Clustered forward renderer supports many lights
// but performance degrades with too many visible lights

// Per-object limit (per renderable)
// Depends on feature level and backend

Shadow Maps

Optimize shadow map configuration:
View::ShadowType shadowType = View::ShadowType::PCF;  // Fastest

view->setShadowType(shadowType);

// Reduce shadow map resolution if needed
view->setShadowMapSize(1024);  // Default: 2048

Backend-Specific Optimizations

Vulkan

// Enable parallel shader compilation
Engine::Config config;
config.disableParallelShaderCompile = false;

Metal

Engine::Config config;
// Upload buffer for vertex/index data
config.metalUploadBufferSizeBytes = 512 * 1024;

// Prefer precompiled Metal libraries
config.preferredShaderLanguage = 
    Engine::Config::ShaderLanguage::METAL_LIBRARY;

OpenGL

// For platforms with asymmetric cores
// Engine automatically assigns render thread to Big cores

Profiling and Debugging

Frame Timing

Get frame timing information:
auto frameInfo = renderer->getFrameInfoHistory(1);
if (!frameInfo.empty()) {
    const auto& info = frameInfo[0];
    printf("Frame %u: GPU %.2f ms\n", 
        info.frameId, 
        info.gpuFrameDuration / 1e6);
}

GPU Queries

Frame info includes:
struct FrameInfo {
    uint32_t frameId;
    duration_ns gpuFrameDuration;       // GPU time
    duration_ns denoisedGpuFrameDuration; // Filtered GPU time
    time_point_ns beginFrame;
    time_point_ns endFrame;
    time_point_ns vsync;
    time_point_ns displayPresentTime;
    // ...
};

Resource Tracking

Monitor resource counts:
size_t textureCount = engine->getTextureCount();
size_t materialCount = engine->getMaterialCount();
size_t vbCount = engine->getVertexBufferCount();

// Leaked resources are logged when engine is destroyed

Best Practices

Do’s

Batch static geometry to reduce draw calls
Use instancing for repeated objects
Provide tight bounding boxes for culling
Use compressed textures with mipmaps
Enable automatic instancing when appropriate
Configure memory based on your scene complexity
Profile regularly using frame timing

Don’ts

Don’t create/destroy resources every frame
Don’t use full precision when half precision suffices
Don’t skip mipmaps for textures
Don’t use too many material variants
Don’t ignore frame pacing warnings
Don’t exceed command buffer size

Measuring Performance

// Basic performance monitoring
void monitorPerformance(Renderer* renderer) {
    auto history = renderer->getFrameInfoHistory(60);
    
    double avgGpuTime = 0.0;
    for (const auto& frame : history) {
        avgGpuTime += frame.gpuFrameDuration;
    }
    avgGpuTime /= history.size();
    
    printf("Avg GPU: %.2f ms (%.1f fps)\n",
        avgGpuTime / 1e6,
        1e9 / avgGpuTime);
}

See Also

Build docs developers (and LLMs) love