Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/TracingInsights/tif1/llms.txt

Use this file to discover all available pages before exploring further.

Overview

Tif1’s architecture is designed for performance and reliability. Data flows through multiple layers with caching, validation, and lazy loading to ensure fast access while maintaining data integrity.

Architecture Overview

Data Flow Layers

1. User Layer

The entry point where users request data:
import tif1

# Step 1: Create session (instant - no data loaded)
session = tif1.get_session(2024, "Monaco", "Race")

# Step 2: Access property (triggers data flow)
laps = session.laps  # Data flow starts here

2. Lazy Loading Layer

Data is only loaded when accessed:
# Creating a session doesn't load any data
session = tif1.get_session(2024, "Monaco", "Race")  # Instant

# Data is loaded on first access
laps = session.laps  # Triggers: check cache → fetch from CDN → process
weather = session.weather  # Separate data flow for weather
tel = laps.iloc[0].telemetry  # Separate data flow for telemetry
From core.py:3491-3547 - the laps property:
@property
def laps(self) -> DataFrame:
    """Get all laps data for the session (auto-async for 4-5x faster loading)."""
    if self._laps is None:
        # Check in-memory cache
        cache_key = f"{self.year}_{self.gp}_{self.session}_laps"
        lap_cache = _get_backend_lap_cache(self.lib) if self.enable_cache else None
        if lap_cache is not None:
            cached_laps = lap_cache.get(cache_key)
            if cached_laps is not None:
                logger.info(f"Lap cache hit ({self.lib}): {cache_key}")
                self._laps = cached_laps
                return self._laps

        # Cache miss - load async
        logger.info(f"Loading laps async ({self.lib}): {cache_key}")
        laps_df = asyncio.run(self.laps_async())
        self._laps = Laps(laps_df)
        self._laps.session = self
        
        # Store in cache
        if lap_cache is not None:
            lap_cache.set(cache_key, self._laps)

    return self._laps

3. Cache Layer

Two-level caching system:

Level 1: In-Memory Cache (LRU)

Fast, process-local cache using an LRU (Least Recently Used) strategy:
# From core.py:935-962
class LRUCache:
    """Thread-safe LRU cache with size limit."""
    def __init__(self, maxsize: int = MAX_CACHE_SIZE):
        self.cache = OrderedDict()
        self.maxsize = maxsize
        self.lock = threading.Lock()
Performance:
  • Speed: Instant (memory access)
  • Scope: Current process only
  • Capacity: Limited by MAX_CACHE_SIZE (default: 1000 items)
  • Persistence: Lost when process exits

Level 2: SQLite Cache

Persistent cache stored on disk:
# Cache structure
import sqlite3

# Session-level data cache
CREATE TABLE cache (
    key TEXT PRIMARY KEY,          # e.g., "2024/Monaco/Race/drivers.json"
    data TEXT                       # JSON string
)

# Telemetry-specific cache
CREATE TABLE telemetry_cache (
    year INTEGER,
    gp TEXT,
    session TEXT,
    driver TEXT,
    lap INTEGER,
    data TEXT,                      # JSON string
    PRIMARY KEY (year, gp, session, driver, lap)
)
Performance:
  • Speed: Fast (disk I/O, ~1-10ms)
  • Scope: Persistent across sessions
  • Capacity: Limited by disk space
  • Persistence: Survives process restarts
Cache Hit Performance:
  • In-Memory: Instant (< 1ms)
  • SQLite: Fast (~1-10ms)
  • CDN: Slow (~200-1000ms)
Using cache is 20-1000x faster than fetching from CDN.

4. Network Layer

CDN Data Source

Tif1 fetches data from TracingInsights GitHub repositories served through jsDelivr CDN:
# CDN URL structure
https://cdn.jsdelivr.net/gh/tracinginsights/{year}@main/
    {gp}/{session}/{path}

# Example URLs:
# Drivers data:
https://cdn.jsdelivr.net/gh/tracinginsights/2024@main/
    Monaco_Grand_Prix/Race/drivers.json

# Lap times for VER:
https://cdn.jsdelivr.net/gh/tracinginsights/2024@main/
    Monaco_Grand_Prix/Race/VER/laptimes.json

# Telemetry for VER lap 45:
https://cdn.jsdelivr.net/gh/tracinginsights/2024@main/
    Monaco_Grand_Prix/Race/VER/45_tel.json

Async HTTP Fetching

Tif1 uses parallel async fetching with HTTP/2 via niquests:
# From async_fetch.py
async def fetch_multiple_async(
    requests: list[tuple[int, str, str, str]],
    use_cache: bool = True,
    write_cache: bool = True,
    max_concurrent_requests: int = 10
) -> list[dict | None]:
    """Fetch multiple JSON files in parallel."""
    # Parallel HTTP requests with connection pooling
    async with niquests.AsyncSession() as session:
        tasks = [fetch_one(session, year, gp, sess, path) 
                 for year, gp, sess, path in requests]
        results = await asyncio.gather(*tasks, return_exceptions=True)
    return results
Performance:
  • HTTP/2: Multiplexing, header compression
  • Connection Pooling: Reuse TCP connections
  • Parallel Fetching: Load multiple files simultaneously
  • Result: 4-5x faster than sequential loading
Async Loading Example:
import asyncio

async def load_session():
    session = tif1.get_session(2024, "Monaco", "Race")
    laps = await session.laps_async()  # Parallel loading
    return laps

laps = asyncio.run(load_session())
For 20 drivers:
  • Sequential: ~10 seconds
  • Async parallel: ~0.4 seconds (25x faster)

Retry Logic

Automatic retry with exponential backoff:
# From retry.py
@retry_with_backoff(
    max_retries=3,
    backoff_factor=2.0,
    jitter=True,
    exceptions=(niquests.RequestException,)
)
def fetch_from_url(url: str) -> dict:
    """Fetch with automatic retry."""
    response = session.get(url, timeout=30)
    response.raise_for_status()
    return parse_response_json(response)
Retry Strategy:
  • Attempt 1: Immediate
  • Attempt 2: Wait ~2 seconds
  • Attempt 3: Wait ~4 seconds
  • Attempt 4: Fail with exception

5. Processing Layer

JSON Parsing and Validation

# From core_utils/json_utils.py
def parse_response_json(response) -> dict:
    """Parse JSON with validation."""
    import orjson  # Fast JSON parser
    
    # Parse JSON (orjson is 2-3x faster than stdlib json)
    data = orjson.loads(response.content)
    
    # Validate structure
    if not isinstance(data, dict):
        raise InvalidDataError("Expected dict payload")
    
    return data

DataFrame Construction

Transform JSON to optimized DataFrames:
# From core.py:1086-1148
def _create_lap_df(lap_data: dict, driver: str, team: str, lib: str) -> DataFrame:
    """Create lap DataFrame with driver and team info (zero-copy optimized)."""
    
    # Create DataFrame (zero-copy when possible)
    if lib == 'polars':
        lap_df = pl.DataFrame(lap_data, strict=False)
        lap_df = lap_df.with_columns([
            pl.lit(driver).alias('Driver'),
            pl.lit(team).alias('Team')
        ])
    else:
        lap_df = pd.DataFrame(lap_data, copy=False)  # Zero-copy
        lap_df['Driver'] = driver
        lap_df['Team'] = team
    
    return lap_df

Type Optimization

Optimize memory usage with proper dtypes:
# From core.py:1167-1249
def _apply_laps_dtypes(df: pd.DataFrame) -> pd.DataFrame:
    """Enforce dtype contract on pandas laps DataFrame."""
    
    # Timedelta columns (lap times, sector times)
    for col in ('LapTime', 'Sector1Time', 'Sector2Time', 'Sector3Time'):
        if col in df.columns:
            df[col] = pd.to_timedelta(df[col], unit='s')
    
    # Float64 columns (lap number, position, speeds)
    for col in ('LapNumber', 'Position', 'SpeedI1', 'SpeedI2'):
        if col in df.columns:
            df[col] = pd.to_numeric(df[col], errors='coerce').astype('float64')
    
    # Categorical columns (driver, team, compound) - 50% memory reduction
    for col in ('Driver', 'Team', 'Compound'):
        if col in df.columns:
            df[col] = df[col].astype('category')
    
    return df
Type Optimization Benefits:
  • Categorical types: 50% memory reduction for repeated strings
  • Proper numeric types: Faster computations
  • Timedelta types: Native time operations

Data Flow Scenarios

Scenario 1: Cold Start (First Load)

No cached data available:
# User code
session = tif1.get_session(2024, "Monaco", "Race")
laps = session.laps  # First access

# Internal flow:
# 1. Check in-memory cache → MISS
# 2. Check SQLite cache → MISS
# 3. Build async requests for all drivers
# 4. Fetch 20 driver laptime files in parallel (~0.4s)
# 5. Parse and validate JSON
# 6. Construct DataFrame with proper dtypes
# 7. Apply categorical types
# 8. Store in both caches
# 9. Return to user
Performance: ~0.4-1.0 seconds for 20 drivers

Scenario 2: Warm Start (SQLite Cache)

Data exists in SQLite cache:
# User code (different process, same day)
session = tif1.get_session(2024, "Monaco", "Race")
laps = session.laps

# Internal flow:
# 1. Check in-memory cache → MISS (different process)
# 2. Check SQLite cache → HIT
# 3. Deserialize from SQLite (~10ms)
# 4. Store in in-memory cache
# 5. Return to user
Performance: ~10-50ms

Scenario 3: Hot Start (In-Memory Cache)

Data already loaded in current process:
# User code (same process)
session = tif1.get_session(2024, "Monaco", "Race")
laps1 = session.laps  # First access - loads data
laps2 = session.laps  # Second access

# Internal flow (second access):
# 1. Check in-memory cache → HIT
# 2. Return cached DataFrame immediately
Performance: < 1ms (instant)

Scenario 4: Telemetry Loading

Telemetry has a more granular flow:
# User code
lap = session.laps.pick_fastest()
tel = lap.telemetry

# Internal flow:
# 1. Identify driver and lap number
# 2. Check in-memory telemetry cache → MISS
# 3. Check SQLite telemetry_cache → MISS
# 4. Fetch telemetry JSON from CDN
#    URL: {year}/{gp}/{session}/{driver}/{lap}_tel.json
# 5. Parse telemetry data (arrays of sensor values)
# 6. Create telemetry DataFrame
# 7. Add metadata columns (Driver, LapNumber)
# 8. Store in both caches
# 9. Return to user
Performance:
  • Cold: ~200-500ms per lap
  • Cached: ~1-10ms

Scenario 5: Batch Telemetry Loading

Optimized parallel loading:
# User code
fastest_tels = session.get_fastest_laps_tels(by_driver=True)

# Internal flow:
# 1. Get fastest lap for each driver (from laps DataFrame)
# 2. Check which telemetry is cached
# 3. Build list of missing telemetry files
# 4. Fetch ALL missing telemetry in parallel
#    - 20 drivers, 20 parallel requests
#    - Uses asyncio.gather() for concurrency
# 5. Process all telemetry DataFrames
# 6. Concatenate into single DataFrame
# 7. Store each in cache
# 8. Return combined DataFrame
Performance:
  • Cold: ~0.4s for 20 drivers (parallel)
  • Sequential would be: ~10s (25x slower)

Ultra-Cold Mode

For maximum performance on first load, tif1 offers ultra-cold mode:
# From config.py
config = {
    'ultra_cold_start': True,
    'ultra_cold_skip_retries': True,
    'ultra_cold_background_cache_fill': True
}
Ultra-Cold Optimizations:
  1. Skip Validation: Parse JSON without schema validation
  2. Skip Retries: Fail fast on errors
  3. Background Caching: Fetch data, return immediately, cache in background
# With ultra-cold mode
session = tif1.get_session(2024, "Monaco", "Race")
laps = session.laps  # Returns immediately with data
# Cache is filled in background thread
Performance: Can be 2-3x faster on first load
Ultra-cold mode trades reliability for speed. Use only when:
  • You need maximum performance
  • You can tolerate occasional errors
  • Data source is trusted

Data Flow Diagrams

Complete Data Flow

Telemetry-Specific Flow

Performance Characteristics

Operation Latencies

OperationCold (No Cache)Warm (SQLite)Hot (Memory)
Load laps (20 drivers)400-1000ms10-50ms<1ms
Load single telemetry200-500ms1-10ms<1ms
Load 20 telemetry (parallel)400-1000ms10-50ms<1ms
Load 20 telemetry (sequential)10-15s200-500ms<1ms
Load weather200-400ms1-5ms<1ms
Load race control200-400ms1-5ms<1ms

Cache Effectiveness

# Typical cache hit rates in a workflow
import tif1

# First run (cold start)
session1 = tif1.get_session(2024, "Monaco", "Race")
laps1 = session1.laps  # 400ms - CDN fetch

# Second run (same process)
laps2 = session1.laps  # &lt;1ms - memory cache hit

# Third run (different process, same machine)
session2 = tif1.get_session(2024, "Monaco", "Race")
laps3 = session2.laps  # 10ms - SQLite cache hit
Cache Hit Rate Expectations:
  • Development: 90-95% (iterating on same data)
  • Production: 60-80% (varied data access)
  • CI/CD: 0-20% (fresh environments)

Configuration

Cache Configuration

import tif1

# Disable all caching
session = tif1.get_session(2024, "Monaco", "Race", enable_cache=False)

# Custom cache directory
import os
os.environ['TIF1_CACHE_DIR'] = '/path/to/cache'

# Clear cache
from tif1.cache import get_cache
cache = get_cache()
cache.clear()

Network Configuration

import os

# Set request timeout (seconds)
os.environ['TIF1_TIMEOUT'] = '60'

# Set max retries
os.environ['TIF1_MAX_RETRIES'] = '5'

# Set retry backoff factor
os.environ['TIF1_RETRY_BACKOFF_FACTOR'] = '1.5'

Troubleshooting

Slow Initial Load

Problem: First data access takes too long Solutions:
# 1. Use async loading
laps = await session.laps_async()  # Parallel loading

# 2. Enable ultra-cold mode
os.environ['TIF1_ULTRA_COLD_START'] = 'true'

# 3. Load only what you need
session.load(laps=True, telemetry=False, weather=False, messages=False)

Cache Issues

Problem: Cache not being used Check:
# Verify cache is enabled
print(session.enable_cache)  # Should be True

# Check cache directory
from tif1.cache import get_cache
cache = get_cache()
print(cache.cache_dir)  # Verify directory exists and is writable

# Clear corrupted cache
cache.clear()

Network Errors

Problem: Frequent network failures Solutions:
# Increase timeout
os.environ['TIF1_TIMEOUT'] = '60'  # Default: 30

# Increase retries
os.environ['TIF1_MAX_RETRIES'] = '5'  # Default: 3

# Check CDN status
# Visit: https://www.jsdelivr.com/

Sessions

Understanding session objects and lazy loading

Laps and Telemetry

Working with lap and telemetry data structures

Caching

Detailed cache configuration and management

Performance

Performance optimization tips and benchmarks

Build docs developers (and LLMs) love