Skip to main content
Datatable is a Python package for manipulating 2-dimensional tabular data structures (DataFrames). Built with a native C++ core, it emphasizes speed and big data support — processing datasets up to 100GB on a single node. The API is inspired by R’s data.table and integrates seamlessly with pandas, numpy, and pyarrow.

Quick Start

Install datatable and run your first data operations in minutes.

Core Concepts

Understand Frames, f-expressions, and the DT[i, j, by] syntax.

Working with Data

Filter, transform, group, join, and read/write data.

API Reference

Full API documentation for all classes and functions.

Get started in 3 steps

1

Install datatable

Install the package from PyPI using pip:
pip install datatable
2

Load your data

Create a Frame from a CSV file, Python dict, numpy array, or pandas DataFrame:
import datatable as dt

# Read a CSV file (fast, auto-detects format)
DT = dt.fread("data.csv")

# Or create from a dictionary
DT = dt.Frame({"A": [1, 2, 3, 4, 5], "B": [10.5, 20.1, 30.3, 40.0, 50.7]})
print(DT)
3

Query and transform

Use the expressive DT[i, j, by(...)] syntax to filter, select, and aggregate:
from datatable import f, by

# Select rows where A > 2, compute B * 2
result = DT[f.A > 2, f.B * 2]

# Group by a column and compute mean
summary = DT[:, dt.mean(f.B), by(f.A)]

Why datatable?

Blazing fast

Native C++ implementation with multi-threaded processing. Sort, group, and join hundreds of millions of rows in seconds.

Big data support

Memory-mapped datasets let you work on files larger than RAM transparently, without loading everything into memory.

Expressive syntax

The DT[i, j, by(...)] query syntax is concise and powerful — filter rows, select columns, and aggregate in a single expression.

Pandas compatible

Convert to/from pandas DataFrames, numpy arrays, and pyarrow tables. Drop-in complement for your existing workflow.

Fast I/O

fread() automatically detects CSV format, handles compressed files, reads URLs, and is orders of magnitude faster than pandas read_csv.

Built-in ML models

Includes FTRL online learning and LinearModel for classification and regression — no additional libraries required.
datatable requires Python 3.6+ (64-bit) and pip 20.3+. Pre-built wheels are available for macOS, Linux, and Windows.

Build docs developers (and LLMs) love