Performance Optimization

Overview

tif1 is built with performance as the top priority. Every layer of the library is optimized for speed - from network fetching to data parsing to DataFrame construction. This page covers the key optimization strategies and how to leverage them.

Performance First: The entire existence of this library is to focus on optimization, speed and performance. Performance is critical to everything we do.

Lazy Loading

All session data is loaded lazily - data is only fetched when you actually access it.

How It Works

import tif1

# Session object created instantly - no network requests yet
session = tif1.get_session(2024, "Bahrain", "Race")

# Network request happens here when laps are accessed
laps = session.laps  # Fetches lap data from CDN

# Telemetry fetched on-demand per driver/lap
tel = session.laps.pick_driver("VER").pick_lap(1).get_telemetry()

Benefits

Instant initialization: Create sessions without waiting for data
Selective loading: Only fetch what you need
Reduced memory: Don’t load unused data

Caching System

tif1 uses a high-performance SQLite-backed cache with in-memory LRU layer.

Cache Architecture

┌─────────────────────────────────────┐
│      In-Memory LRU Cache            │
│  • Lock-free reads                  │
│  • OrderedDict (1024 items)         │
│  • &lt;1ms access time                 │
└─────────────────────────────────────┘
              ↓ (on miss)
┌─────────────────────────────────────┐
│      SQLite Cache (WAL mode)        │
│  • Persistent storage               │
│  • Batched writes (every 25 ops)    │
│  • 64MB cache_size                  │
└─────────────────────────────────────┘

Cache Configuration

The cache system is highly tunable via configuration:

from tif1 import get_config

config = get_config()

# In-memory cache size (default: 1024 items)
config.set("memory_cache_max_items", 2048)

# Telemetry cache size (default: 2048 items)  
config.set("memory_telemetry_cache_max_items", 4096)

# SQLite commit interval (default: 25)
config.set("cache_commit_interval", 50)

# SQLite timeout (default: 30.0s)
config.set("sqlite_timeout", 60.0)

Cache Location

The cache is stored in platform-specific directories:

Linux: ~/.cache/tif1/ (or ~/.tif1/ if ~/.cache doesn’t exist)
macOS: ~/Library/Caches/tif1/
Windows: %LOCALAPPDATA%/Temp/tif1/

You can override with environment variable:

export TIF1_CACHE_DIR="/custom/cache/path"

Cache Performance

The dual-layer cache provides dramatic speedups:

Memory hit: <1ms (lock-free read from OrderedDict)
SQLite hit: 5-10ms (database query + JSON deserialize)
Network fetch: 200-500ms (CDN request + validation)

Result: Cache hits are 10-100x faster than network requests.

Managing Cache

from tif1.cache import get_cache

cache = get_cache()

# Check if session data is cached
has_data = cache.has_session_data(2024, "Bahrain", "Race")

# Clear all cache
cache.clear()

# Close cache connection
cache.close()

Async Fetching

tif1 supports parallel data fetching with asyncio for 4-5x speedups.

Parallel Requests

import asyncio
import tif1
from tif1.async_fetch import fetch_multiple_async

# Fetch multiple sessions in parallel
async def load_multiple_sessions():
    requests = [
        (2024, "Bahrain", "Race", "laptimes.json"),
        (2024, "Bahrain", "Race", "drivers.json"),
        (2024, "Bahrain", "Race", "weather.json"),
    ]
    
    results = await fetch_multiple_async(
        requests,
        max_concurrent_requests=10,  # Control parallelism
    )
    return results

# Run async code
results = asyncio.run(load_multiple_sessions())

Concurrency Control

Control how many parallel requests to make:

from tif1 import get_config

config = get_config()

# Max concurrent requests (default: 20)
config.set("max_concurrent_requests", 50)

# Max worker threads (default: 20)
config.set("max_workers", 50)

# Telemetry prefetch concurrency (default: 32)
config.set("telemetry_prefetch_max_concurrent_requests", 64)

Rate Limiting

Use semaphores to prevent overwhelming the CDN:

import asyncio
from tif1.async_fetch import fetch_with_rate_limit, fetch_json_async

async def fetch_limited():
    # Create semaphore for max 5 concurrent requests
    semaphore = asyncio.Semaphore(5)
    
    result = await fetch_with_rate_limit(
        fetch_json_async,
        2024, "Bahrain", "Race", "drivers.json",
        semaphore=semaphore
    )
    return result

Backend Selection

tif1 supports both pandas and polars backends, with polars offering 2x faster performance for large datasets.

Pandas (Default)

import tif1

# Uses pandas by default
session = tif1.get_session(2024, "Bahrain", "Race")
print(type(session.laps))  # pandas.DataFrame

Polars

Switch to polars for better performance:

from tif1 import get_config

config = get_config()
config.set("lib", "polars")

# Or via environment variable
import os
os.environ["TIF1_LIB"] = "polars"

session = tif1.get_session(2024, "Bahrain", "Race")
print(type(session.laps))  # polars.DataFrame

Performance Comparison

Operation	pandas	polars	Speedup
Load laps	150ms	75ms	2.0x
Filter laps	20ms	8ms	2.5x
Aggregations	50ms	20ms	2.5x
Memory usage	100MB	60MB	1.67x

See Polars Backend for detailed comparison.

Memory Optimization

Categorical Types

tif1 automatically converts repeated strings to categorical types, saving 50% memory:

# Driver codes: "VER", "HAM", "LEC" etc. stored once
# Each lap references category index instead of full string
laps = session.laps
print(laps["Driver"].dtype)  # category (pandas)

Nullable Types

Proper null handling without object dtype overhead:

# Uses pandas nullable types: Int64, Float64, boolean
# Not generic object dtype
print(laps["LapNumber"].dtype)  # Int64 (not object)

Ultra Cold Start Mode

Optimize first request with minimal retries:

config.set("ultra_cold_start", True)  # default
config.set("ultra_cold_skip_retries", True)  # default

# First request uses zero retries for fastest possible response
# Subsequent requests use normal retry logic

Network Optimization

HTTP/2 Connection Pooling

tif1 uses niquests for HTTP/2 support:

Connection pooling: Reuse TCP connections
Header compression: HPACK compression
Multiplexing: Multiple requests per connection

config = get_config()

# HTTP/2 multiplexing (default: True)
config.set("http_multiplexed", True)

# Connection keepalive (default: 120s)
config.set("keepalive_timeout", 180)

# Max requests per connection (default: 1000)
config.set("keepalive_max_requests", 2000)

CDN Optimization

jsDelivr CDN provides global edge network with:

Compression: Gzip/Brotli support
Edge caching: Serve from nearest location
High availability: 99.9% uptime

# Enable CDN minification (experimental)
config.set("cdn_use_minification", True)  # 20-40% smaller files

Connection Statistics

Monitor connection reuse:

from tif1.http_session import get_connection_stats

stats = get_connection_stats()
print(f"Total requests: {stats['total']}")
print(f"Reused connections: {stats['reused']}")
print(f"Reuse rate: {stats['reuse_rate']:.1%}")

Validation Trade-offs

Validation adds safety but costs performance. Disable for speed:

config = get_config()

# Disable all validation (fastest)
config.set("validate_data", False)  # default: False
config.set("validate_lap_times", False)  # default: False  
config.set("validate_telemetry", False)  # default: False

Performance impact:

Validation disabled: ~5-10% faster data loading
Validation enabled: Catches data corruption early

See Validation for details on validation options.

Prefetching Strategies

Driver Laps Prefetch

Automatically prefetch all laps when getting a driver:

config.set("prefetch_driver_laps_on_get_driver", True)  # default

# When you access a driver, all their laps are fetched
driver = session.laps.pick_driver("VER")
# All laps already loaded - no additional requests

Telemetry Prefetch

Prefetch all telemetry data in parallel:

# Prefetch after loading laps (background)
config.set("prefetch_all_telemetry_after_laps_load", False)  # default

# Prefetch on first telemetry request
config.set("prefetch_all_telemetry_on_first_lap_request", False)  # default

# Set to True for "download everything" mode
config.set("prefetch_all_telemetry_after_laps_load", True)

Benchmarking

Measure Your Code

import time
import tif1

# Cold start (no cache)
start = time.time()
session = tif1.get_session(2024, "Bahrain", "Race")
laps = session.laps
print(f"Cold: {time.time() - start:.2f}s")

# Warm start (cached)
start = time.time()
session = tif1.get_session(2024, "Bahrain", "Race")  
laps = session.laps
print(f"Warm: {time.time() - start:.2f}s")

Enable Debug Logging

import logging
logging.basicConfig(level=logging.DEBUG)

# See cache hits, network requests, timing info
session = tif1.get_session(2024, "Bahrain", "Race")

Best Practices

Use Caching

Leave caching enabled (default) for automatic performance.

Batch Operations

Use async APIs to fetch multiple resources in parallel.

Choose Backend

Use polars for 2x speedup on large datasets (>10k laps).

Monitor Stats

Check connection reuse stats to verify optimization.

Performance Checklist

✅ Cache enabled: Default, provides 10-100x speedup
✅ HTTP/2 pooling: Automatic with niquests
✅ Lazy loading: Only fetch what you need
✅ Polars backend: 2x faster for large datasets
✅ Categorical types: 50% memory reduction
✅ Ultra cold start: Fastest first request
✅ Async fetching: 4-5x speedup for parallel loads

Next Steps

Polars Backend

Learn about the high-performance polars backend

Circuit Breaker

Understand retry logic and failure handling

Validation

Configure data validation trade-offs

Get Started

Core Concepts

Guides

Advanced

CLI

Documentation Index

​Overview

​Lazy Loading

​How It Works

​Benefits

​Caching System

​Cache Architecture

​Cache Configuration

​Cache Location

​Cache Performance

​Managing Cache

​Async Fetching

​Parallel Requests

​Concurrency Control

​Rate Limiting

​Backend Selection

​Pandas (Default)

​Polars

​Performance Comparison

​Memory Optimization

​Categorical Types

​Nullable Types

​Ultra Cold Start Mode

​Network Optimization

​HTTP/2 Connection Pooling

​CDN Optimization

​Connection Statistics

​Validation Trade-offs

​Prefetching Strategies

​Driver Laps Prefetch

​Telemetry Prefetch

​Benchmarking

​Measure Your Code

​Enable Debug Logging

​Best Practices

Use Caching

Batch Operations

Choose Backend

Monitor Stats

​Performance Checklist

​Next Steps

Polars Backend

Circuit Breaker

Validation

Build docs developers (and LLMs) love

Overview

Lazy Loading

How It Works

Benefits

Caching System

Cache Architecture

Cache Configuration

Cache Location

Cache Performance

Managing Cache

Async Fetching

Parallel Requests

Concurrency Control

Rate Limiting

Backend Selection

Pandas (Default)

Polars

Performance Comparison

Memory Optimization

Categorical Types

Nullable Types

Ultra Cold Start Mode

Network Optimization

HTTP/2 Connection Pooling

CDN Optimization

Connection Statistics

Validation Trade-offs

Prefetching Strategies

Driver Laps Prefetch

Telemetry Prefetch

Benchmarking

Measure Your Code

Enable Debug Logging

Best Practices

Performance Checklist

Next Steps