Skip to main content
This guide walks you through the core datatable workflow: creating a Frame, reading a CSV file, selecting and filtering data with f-expressions, grouping, and exporting results.

Prerequisites

Install datatable before you begin:
pip install datatable
See Installation if you run into any issues.
1

Import datatable

Import the library and confirm the version:
import datatable as dt
print(dt.__version__)
2

Create a Frame

A Frame is the fundamental unit of analysis in datatable — a two-dimensional table of rows and columns, similar to a pandas DataFrame or SQL table.Create one from a Python dictionary:
DT = dt.Frame({"n": [1, 3], "s": ["foo", "bar"]})
print(DT)
   |     n s
   | int32 str32
-- + ----- -----
 0 |     1 foo
 1 |     3 bar

[2 rows x 2 columns]
You can also create a Frame from a numpy array, a pandas DataFrame, or with explicit column types:
import math

DT = dt.Frame(A=range(5), B=[1.7, 3.4, 0, None, -math.inf],
              stypes={"A": dt.int64})
3

Load a CSV file with fread()

fread() reads CSV, text, Excel, and other formats. It automatically detects separators, headers, column types, and quoting rules. It also handles URLs, shell output, .zip archives, and glob patterns.
DT = dt.fread("~/Downloads/dataset_01.csv")
Check basic properties after loading:
print(DT.shape)   # (nrows, ncols)
print(DT.names)   # column names
print(DT.types)   # column types
For large files, fread() reads data using multiple threads and shows a progress indicator automatically.
4

Select rows and columns

datatable uses DT[i, j] notation for all data access — the same indexing used in mathematics, C/C++, R, and numpy.
  • i is the row selector
  • j is the column selector
DT[:, "A"]          # select column A, all rows
DT[:10, :]          # first 10 rows, all columns
DT[::-1, "A":"D"]   # reverse row order, columns A through D
DT[27, 3]           # single element: row 27, column 3 (0-based)
You can also update or delete subsets:
DT[i, j] = r        # replace values in the [i, j] subset with r
del DT[:, "D"]      # delete column D
5

Filter rows with f-expressions

f is a “frame proxy” — a variable you import from datatable that lets you reference columns by name in expressions. It becomes a reference to the current Frame wherever it is used.
from datatable import f, mean, sd, min, max

# Select rows where column A is an outlier
DT[(f.A > mean(f.B) + 2.5 * sd(f.B)) | (f.A < -mean(f.B) - sd(f.B)), :]
You can also use f-expressions to compute derived columns:
# Normalize column A to the range [0, 1]
DT[:, (f.A - min(f.A)) / (max(f.A) - min(f.A))]
And compute multiple columns at once using a dictionary (keys become the new column names):
DT[:, {"A": f.A, "B": f.B, "A+B": f.A + f.B, "A-B": f.A - f.B}]
f refers to the current Frame. When joining two frames, g refers to the joined (second) frame.
6

Group and aggregate

The by() modifier splits a Frame into groups before applying the column expression. This affects aggregation functions like sum(), mean(), min(), and sd().
from datatable import f, by, sum, mean

# Total quantity sold per product
DT = dt.fread("transactions.csv")
DT[:, sum(f.quantity), by(f.product_id)]

# Average of column A grouped by column B
DT[:, mean(f.A), by("B")]
You can combine by() with sort() to order results within each group:
DT[:, sum(f.quantity), by(f.product_id), dt.sort(f.product_id)]
7

Export data

Convert a Frame to pandas, numpy, or plain Python structures:
df = DT.to_pandas()   # pandas DataFrame
arr = DT.to_numpy()   # numpy array
d = DT.to_dict()      # dict keyed by column name
rows = DT.to_tuples() # list of row tuples
Save to a CSV file or to the binary .jay format:
DT.to_csv("output.csv")
DT.to_jay("data.jay")     # fast binary format; reopen instantly with fread()
The .jay format stores data on disk in the same layout as in memory, so files can be memory-mapped and worked on without loading everything into RAM.
DT2 = dt.fread("data.jay")

Next steps

Core concepts: Frame

Understand the Frame object — its structure, types, and properties.

Core concepts: f-expressions

Learn the full power of f-expressions for filtering, transforming, and aggregating data.

Selecting and filtering

Deep dive into row and column selection with the DT[i, j] syntax.

Reading and writing data

Explore all input and output options including fread(), CSV, JAY, and more.

Build docs developers (and LLMs) love