Skip to main content
Python’s datatable is closely related to R’s data.table and intentionally mirrors its API. Because of language differences, some operations look slightly different. This page maps the most common data.table patterns to their Python datatable equivalents.

Setup

from datatable import dt, f, g, by, update, join, sort

DT = dt.Frame(
    x=["b"] * 3 + ["a"] * 3 + ["c"] * 3,
    y=[1, 3, 6] * 3,
    v=range(1, 10),
)

Subsetting Rows (i)

DT[1, :]            # 2nd row (0-indexed)
DT[1:3, :]          # 2nd and 3rd rows
DT[[2, 1], :]       # 3rd then 2nd row
DT[[1, 4], :]       # 2nd and 5th rows
DT[1:5, :]          # rows 2 through 5
DT[4::-1, :]        # rows 5 down to 1 (reversed)
DT[-1, :]           # last row
DT[f.y > 2, :]      # all rows where y > 2
DT[(f.y > 2) & (f.v > 5), :]   # compound condition
DT[[0, slice(4, None)], :]      # all rows except 2, 3, 4
R uses 1-based indexing and inclusive slices. Python uses 0-based indexing and exclusive end-of-slice. DT[2] in R is the 2nd row; in Python DT[2] selects the 3rd column.In Python datatable, row selection with an f-expression always requires an explicit column selector: DT[f.y == 3, :] — not DT[f.y == 3].

Sorting

# Sort by x ascending
DT.sort("x")
DT[:, :, sort("x")]

# Sort by x descending
DT.sort(-f.x)
DT[:, :, sort(-f.x)]

# Sort by x ascending, y descending
DT.sort(f.x, -f.y)
DT[:, :, sort(f.x, -f.y)]

Selecting Columns (j)

DT[:, "v"]                     # Select column v
DT["v"]                        # Shorthand (single column)
DT[:, ["x", "v"]]              # Select multiple columns
DT[:, {"m": f.x}]              # Rename and select
DT[:, {"sv": dt.sum(f.v)}]     # Aggregate and rename
DT[:, [f.v, f.v * 2]]          # v and v doubled
DT[:, 1]                       # 2nd column (0-indexed)
DT[:, -1]                      # Last column
DT[:, f["x":"y"]]              # Columns x through y
DT[:, f[:].remove(f["x":"y"])] # Exclude columns x and y

# Select by variable
col = "v"
DT[:, col]

cols = ["v", "y"]
DT[:, cols]

Filtering and Aggregating Together

# Sum v over rows 2 and 3 (0-indexed)
DT[1:3, dt.sum(f.v)]

# Same with new column name
DT[1:3, {"sv": dt.sum(f.v)}]

# Filter in i, aggregate in j
DT[f.x == "b", dt.sum(f.v * f.y)]

# Return as scalar
DT[f.x == "b", dt.sum(f.v * f.y)][0, 0]

# Aggregate multiple columns
DT[:, dt.mean([f.y, f.v])]

Grouping with by

# Sum v by x
DT[:, dt.sum(f.v), by("x")]

# Filter first, then group (chaining required)
DT[f.x != "a", :][:, dt.sum(f.v), by("x")]

# Row count per group
DT[:, dt.count(), by("x")]

# First row per group
DT[0, :, by("x")]

# Row count + sum of all columns per group
DT[:, [dt.count(), dt.sum(f[:])], by("x")]

# Expression in by
DT[:, dt.sum(f.v), by(f.y % 2)]

# Row where v is minimum per group
DT[0, f[:], by("x"), dt.sort(f.v)]

# First 2 rows per group
DT[:2, :, by("x")]

# Last 2 rows per group
DT[-2:, :, by("x")]

# Multiple aggregations
DT[:, {"MySum": dt.sum(f.v),
        "MyMin": dt.min(f.v),
        "MyMax": dt.max(f.v)},
   by(f.x, f.y % 2)]
In R’s data.table, the order of groups in the result is preserved from the source data. In Python datatable, results are sorted by the grouping column. Use keyby in R (or accept default behavior) to get a sorted result equivalent to datatable’s output.

Adding, Updating, and Deleting Columns

# Add a new column
DT[:, update(z=42)]
DT["z"] = 42

# Add multiple columns
DT[:, update(sv=dt.sum(f.v), mv="X")]

# Delete a column
del DT["z"]
DT = DT[:, f[:].remove(f.z)]

# Update rows matching a condition
DT[f.x == "a", update(v=42)]
DT[f.x == "a", "v"] = 42

# Update a new column only for matching rows (NA elsewhere)
DT[f.x == "b", update(v2=84)]

# Add column computed per group
DT[:, update(m=dt.mean(f.v)), by("x")]

# Dynamic column name via variable
col = "rar"
DT[col] = 4242
update() and the del operator in datatable both operate in-place — no reassignment is needed. update() also preserves row order even inside a by(), which is useful for per-group transforms.

Joins

DT = dt.Frame(x=["b"] * 3 + ["a"] * 3 + ["c"] * 3,
              y=[1, 3, 6] * 3,
              v=range(1, 10))
X = dt.Frame({"x": ("c", "b"), "v": (8, 7), "foo": (4, 2)})
X.key = "x"

# Left outer join
DT[:, :, join(X)]

# Inner join (filter out unmatched rows)
DT[g[-1] != None, :, join(X)]

# Not join (rows in DT with no match in X)
DT[g[-1] == None, f[:], join(X)]

# First row per match group
DT[g[-1] != None, :, join(X)][0, :, by("x")]

# Aggregate j from both frames
DT[:, dt.sum(f.v * g.foo), join(X), by(f.x)][f[-1] != 0, :]
datatable currently supports left outer joins only. The joining frame must be keyed, key columns must be unique, and the join column must have the same name in both frames.

Key Differences to Keep in Mind

ConceptR data.tablePython datatable
Indexing base1-based0-based
Slice endInclusive (2:5 = rows 2,3,4,5)Exclusive (2:5 = rows 2,3,4)
Row selection aloneDT[y > 2] worksMust include column selector: DT[f.y > 2, :]
Group orderPreserved from sourceSorted by grouping column
In-place update:= operatorupdate() or direct assignment
Join typeMultiple join typesLeft outer join only
i execution with byi runs before groupingi runs after grouping

Build docs developers (and LLMs) love