Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/TracingInsights/tif1/llms.txt

Use this file to discover all available pages before exploring further.

Overview

tif1 supports polars as an alternative to pandas, offering 2x faster performance for large datasets with 40% less memory usage.
Zero-copy conversion: tif1 uses Apache Arrow for zero-copy conversion between pandas and polars, making backend switching nearly free.

Installation

Install polars as an optional dependency:
pip install tif1[polars]
Or install polars separately:
pip install polars>=1.36

Enabling Polars

Via Configuration

from tif1 import get_config

config = get_config()
config.set("lib", "polars")

# Now all DataFrames will be polars.DataFrame
import tif1
session = tif1.get_session(2024, "Bahrain", "Race")
print(type(session.laps))  # <class 'polars.dataframe.frame.DataFrame'>

Via Environment Variable

export TIF1_LIB=polars
Then use tif1 normally:
import tif1
session = tif1.get_session(2024, "Bahrain", "Race")
print(type(session.laps))  # polars.DataFrame

Via Config File

Add to ~/.tif1rc:
{
  "lib": "polars"
}

Performance Comparison

Load Time

Loading lap data from cache:
import time
import tif1
from tif1 import get_config

# Test pandas
get_config().set("lib", "pandas")
start = time.time()
session = tif1.get_session(2024, "Bahrain", "Race")
laps = session.laps
pandas_time = time.time() - start

# Test polars  
get_config().set("lib", "polars")
start = time.time()
session = tif1.get_session(2024, "Bahrain", "Race")
laps = session.laps
polars_time = time.time() - start

print(f"pandas: {pandas_time:.3f}s")
print(f"polars: {polars_time:.3f}s")
print(f"speedup: {pandas_time/polars_time:.2f}x")
Typical results:
  • pandas: 150ms
  • polars: 75ms
  • Speedup: 2.0x

Filtering

Filtering laps by driver and compound:
# pandas
fast_laps = session.laps[
    (session.laps["Driver"] == "VER") & 
    (session.laps["Compound"] == "SOFT")
]
# 20ms

# polars
fast_laps = session.laps.filter(
    (pl.col("Driver") == "VER") & 
    (pl.col("Compound") == "SOFT")
)
# 8ms - 2.5x faster

Aggregations

Calculating average lap times per driver:
# pandas
avg_times = session.laps.groupby("Driver")["LapTime"].mean()
# 50ms

# polars
avg_times = session.laps.group_by("Driver").agg(pl.col("LapTime").mean())
# 20ms - 2.5x faster

Memory Usage

DatasetpandaspolarsReduction
1000 laps10 MB6 MB40%
5000 laps50 MB30 MB40%
20000 laps200 MB120 MB40%

Backend Conversion

tif1 provides zero-copy conversion between pandas and polars using Apache Arrow.

Convert to Polars

from tif1.core_utils.backend_conversion import pandas_to_polars
import pandas as pd

pandas_df = pd.DataFrame({"a": [1, 2, 3]})
polars_df = pandas_to_polars(pandas_df, rechunk=False)

print(type(polars_df))  # polars.DataFrame

Convert to Pandas

from tif1.core_utils.backend_conversion import polars_to_pandas
import polars as pl

polars_df = pl.DataFrame({"a": [1, 2, 3]})
pandas_df = polars_to_pandas(polars_df, use_pyarrow=True)

print(type(pandas_df))  # pandas.DataFrame

Generic Conversion

from tif1.core_utils.backend_conversion import convert_backend

# Automatically detect and convert
df = session.laps  # Could be pandas or polars

pandas_df = convert_backend(df, "pandas")
polars_df = convert_backend(df, "polars")

Zero-Copy Benefits

Zero-copy means no data duplication in memory. The conversion uses Apache Arrow to share the same memory buffer between pandas and polars.
# Traditional copy (slow, 2x memory)
polars_df = pl.from_pandas(pandas_df, rechunk=True)  # Copies data

# Zero-copy (fast, no extra memory)
polars_df = pl.from_pandas(pandas_df, rechunk=False)  # Shares memory
The backend_conversion.py module always uses zero-copy by default:
# From backend_conversion.py:22-44
def pandas_to_polars(df: pd.DataFrame, *, rechunk: bool = False) -> pl.DataFrame:
    """Convert pandas DataFrame to polars using zero-copy Arrow conversion.
    
    Args:
        df: Pandas DataFrame to convert
        rechunk: Whether to rechunk the result. False for zero-copy (default).
    """
    return pl.from_pandas(df, rechunk=rechunk)  # rechunk=False = zero-copy

Working with Polars

Accessing Data

Polars DataFrames work similarly to pandas:
import tif1
import polars as pl
from tif1 import get_config

get_config().set("lib", "polars")
session = tif1.get_session(2024, "Bahrain", "Race")

# Access columns
lap_times = session.laps["LapTime"]
print(type(lap_times))  # polars.Series

# Filter rows
fast_laps = session.laps.filter(pl.col("LapTime") < 90.0)

# Select columns
subset = session.laps.select(["Driver", "LapNumber", "LapTime"])

Polars Expressions

Polars uses a powerful expression API:
import polars as pl

# Find fastest lap per driver
fastest = session.laps.group_by("Driver").agg(
    pl.col("LapTime").min().alias("FastestLap"),
    pl.col("LapNumber").first().alias("FirstLap"),
    pl.col("Compound").n_unique().alias("Compounds")
)

# Chain operations
result = (
    session.laps
    .filter(pl.col("LapTime").is_not_null())
    .with_columns([
        (pl.col("LapTime") - pl.col("LapTime").mean()).alias("Delta")
    ])
    .sort("LapTime")
    .head(10)
)

Lazy Evaluation

Polars supports lazy evaluation for query optimization:
import polars as pl

# Lazy query (not executed yet)
query = (
    session.laps.lazy()
    .filter(pl.col("Driver") == "VER")
    .select(["LapNumber", "LapTime"])
    .sort("LapTime")
)

# Execute when needed
result = query.collect()

API Compatibility

tif1 provides a consistent API regardless of backend:
# Same code works with both backends
session = tif1.get_session(2024, "Bahrain", "Race")

# Pick driver (works for pandas and polars)
ver_laps = session.laps.pick_driver("VER")

# Pick lap
fastest = session.laps.pick_fastest()

# Get telemetry
tel = fastest.get_telemetry()

Backend Detection

Check which backend is being used:
import pandas as pd
import polars as pl

laps = session.laps

if isinstance(laps, pd.DataFrame):
    print("Using pandas backend")
elif isinstance(laps, pl.DataFrame):
    print("Using polars backend")

Categorical Columns

Polars handles categorical data differently than pandas.

pandas Categorical

get_config().set("lib", "pandas")
session = tif1.get_session(2024, "Bahrain", "Race")

print(session.laps["Driver"].dtype)
# category
print(session.laps["Compound"].dtype)
# category

polars Categorical

Polars uses Utf8 (string) by default. Enable categorical:
get_config().set("lib", "polars")
get_config().set("polars_lap_categorical", True)

session = tif1.get_session(2024, "Bahrain", "Race")
print(session.laps["Driver"].dtype)
# Categorical (if enabled)
Polars categorical is currently opt-in due to compatibility considerations. Set polars_lap_categorical=True to enable.

When to Use Polars

Use Polars When:

Large Datasets

Processing >10k laps or multiple sessions

Complex Queries

Heavy filtering, grouping, and aggregations

Memory Constrained

Need to reduce memory usage by 40%

Performance Critical

Every millisecond counts in your workflow

Use Pandas When:

Ecosystem Tools

Need matplotlib, seaborn, or other pandas tools

Small Datasets

Processing single sessions (<1000 laps)

Familiar API

Team is experienced with pandas

Compatibility

Integrating with pandas-dependent code

Performance Tips

1. Use Lazy Evaluation

# Lazy query - optimizes entire pipeline
result = (
    session.laps.lazy()
    .filter(pl.col("Driver").is_in(["VER", "HAM"]))
    .select(["LapNumber", "LapTime"])
    .sort("LapTime")
    .collect()  # Execute optimized query
)

2. Avoid Row Iteration

# Slow - row iteration
for row in session.laps.iter_rows():
    process(row)

# Fast - vectorized operations  
result = session.laps.with_columns(
    pl.col("LapTime").apply(process)
)

3. Use Streaming

For very large datasets:
# Stream processing (low memory)
result = (
    session.laps.lazy()
    .filter(pl.col("LapTime").is_not_null())
    .collect(streaming=True)
)

Limitations

Current Limitations

  • Plotting: Some plotting libraries expect pandas DataFrames
  • Categorical: Polars categorical is opt-in (polars_lap_categorical)
  • Interop: Some third-party libraries may not support polars

Workarounds

Convert to pandas when needed:
# Get data as polars for speed
get_config().set("lib", "polars")
laps = session.laps

# Convert to pandas for plotting
import pandas as pd
laps_pd = laps.to_pandas()
laps_pd.plot(x="LapNumber", y="LapTime")
Or use convert_backend:
from tif1.core_utils.backend_conversion import convert_backend

laps_pd = convert_backend(laps, "pandas")

Benchmark Results

Real-world performance comparison on Bahrain 2024 Race (20 drivers, 57 laps, 1140 rows):
OperationpandaspolarsSpeedup
Load from cache152ms76ms2.0x
Filter by driver18ms7ms2.6x
Group by driver45ms18ms2.5x
Sort by lap time12ms5ms2.4x
Select columns8ms3ms2.7x
Unique compounds15ms6ms2.5x
Memory: pandas 89MB, polars 54MB (39% reduction)

Migration Guide

Pandas to Polars

Common operations translated:
# pandas
df.head(10)
df[df["column"] > 5]
df.groupby("col").mean()
df.sort_values("col")
df[["col1", "col2"]]

# polars
df.head(10)
df.filter(pl.col("column") > 5)
df.group_by("col").agg(pl.all().mean())
df.sort("col")
df.select(["col1", "col2"])

Testing Both Backends

import pytest
from tif1 import get_config

@pytest.mark.parametrize("backend", ["pandas", "polars"])
def test_lap_loading(backend):
    get_config().set("lib", backend)
    session = tif1.get_session(2024, "Bahrain", "Race")
    assert len(session.laps) > 0

Next Steps

Performance

Learn more performance optimization strategies

Polars Docs

Official Polars documentation

Circuit Breaker

Understand retry and failure handling

Build docs developers (and LLMs) love