Datatable vs R data.table

Python’s datatable is closely related to R’s data.table and intentionally mirrors its API. Because of language differences, some operations look slightly different. This page maps the most common data.table patterns to their Python datatable equivalents.

Setup

Python datatable
R data.table

from datatable import dt, f, g, by, update, join, sort

DT = dt.Frame(
    x=["b"] * 3 + ["a"] * 3 + ["c"] * 3,
    y=[1, 3, 6] * 3,
    v=range(1, 10),
)

library(data.table)

DT = data.table(
  x = rep(c("b", "a", "c"), each = 3),
  y = c(1, 3, 6),
  v = 1:9
)

Subsetting Rows (`i`)

Python datatable
R data.table

DT[1, :]            # 2nd row (0-indexed)
DT[1:3, :]          # 2nd and 3rd rows
DT[[2, 1], :]       # 3rd then 2nd row
DT[[1, 4], :]       # 2nd and 5th rows
DT[1:5, :]          # rows 2 through 5
DT[4::-1, :]        # rows 5 down to 1 (reversed)
DT[-1, :]           # last row
DT[f.y > 2, :]      # all rows where y > 2
DT[(f.y > 2) & (f.v > 5), :]   # compound condition
DT[[0, slice(4, None)], :]      # all rows except 2, 3, 4

DT[2]               # 2nd row (1-indexed)
DT[2:3]             # 2nd and 3rd rows
DT[3:2]             # 3rd then 2nd row
DT[c(2, 5)]         # 2nd and 5th rows
DT[2:5]             # rows 2 through 5
DT[5:1]             # rows 5 down to 1
DT[.N]              # last row
DT[y > 2]           # all rows where y > 2
DT[y > 2 & v > 5]   # compound condition
DT[!2:4]            # all rows except 2, 3, 4

R uses 1-based indexing and inclusive slices. Python uses 0-based indexing and exclusive end-of-slice. DT[2] in R is the 2nd row; in Python DT[2] selects the 3rd column.In Python datatable, row selection with an f-expression always requires an explicit column selector: DT[f.y == 3, :] — not DT[f.y == 3].

Sorting

Python datatable
R data.table

# Sort by x ascending
DT.sort("x")
DT[:, :, sort("x")]

# Sort by x descending
DT.sort(-f.x)
DT[:, :, sort(-f.x)]

# Sort by x ascending, y descending
DT.sort(f.x, -f.y)
DT[:, :, sort(f.x, -f.y)]

# Sort by x ascending
DT[order(x), ]

# Sort by x descending
DT[order(-x)]

# Sort by x ascending, y descending
DT[order(x, -y)]

Selecting Columns (`j`)

Python datatable
R data.table

DT[:, "v"]                     # Select column v
DT["v"]                        # Shorthand (single column)
DT[:, ["x", "v"]]              # Select multiple columns
DT[:, {"m": f.x}]              # Rename and select
DT[:, {"sv": dt.sum(f.v)}]     # Aggregate and rename
DT[:, [f.v, f.v * 2]]          # v and v doubled
DT[:, 1]                       # 2nd column (0-indexed)
DT[:, -1]                      # Last column
DT[:, f["x":"y"]]              # Columns x through y
DT[:, f[:].remove(f["x":"y"])] # Exclude columns x and y

# Select by variable
col = "v"
DT[:, col]

cols = ["v", "y"]
DT[:, cols]

DT[, .(v)]                     # Select column v
DT[, .(x, v)]                  # Select multiple columns
DT[, .(m = x)]                 # Rename and select
DT[, .(sv = sum(v))]           # Aggregate and rename
DT[, .(v, v*2)]                # v and v doubled
DT[, 2]                        # 2nd column
DT[, ncol(DT), with=FALSE]     # Last column
DT[, .SD, .SDcols = x:y]       # Columns x through y
DT[, .SD, .SDcols = !x:y]      # Exclude columns x and y

# Select by variable
col = "v"
DT[, ..col]

cols = c("v", "y")
DT[, ..cols]

Filtering and Aggregating Together

Python datatable
R data.table

# Sum v over rows 2 and 3 (0-indexed)
DT[1:3, dt.sum(f.v)]

# Same with new column name
DT[1:3, {"sv": dt.sum(f.v)}]

# Filter in i, aggregate in j
DT[f.x == "b", dt.sum(f.v * f.y)]

# Return as scalar
DT[f.x == "b", dt.sum(f.v * f.y)][0, 0]

# Aggregate multiple columns
DT[:, dt.mean([f.y, f.v])]

# Sum v over rows 2 and 3
DT[2:3, .(sum(v))]

# Same with new column name
DT[2:3, .(sv = sum(v))]

# Filter in i, aggregate in j
DT[x == "b", .(sum(v * y))]

# Return as scalar
DT[x == "b", sum(v * y)]

# Aggregate multiple columns
DT[, lapply(.SD, mean), .SDcols = c("y", "v")]

Grouping with `by`

Python datatable
R data.table

# Sum v by x
DT[:, dt.sum(f.v), by("x")]

# Filter first, then group (chaining required)
DT[f.x != "a", :][:, dt.sum(f.v), by("x")]

# Row count per group
DT[:, dt.count(), by("x")]

# First row per group
DT[0, :, by("x")]

# Row count + sum of all columns per group
DT[:, [dt.count(), dt.sum(f[:])], by("x")]

# Expression in by
DT[:, dt.sum(f.v), by(f.y % 2)]

# Row where v is minimum per group
DT[0, f[:], by("x"), dt.sort(f.v)]

# First 2 rows per group
DT[:2, :, by("x")]

# Last 2 rows per group
DT[-2:, :, by("x")]

# Multiple aggregations
DT[:, {"MySum": dt.sum(f.v),
        "MyMin": dt.min(f.v),
        "MyMax": dt.max(f.v)},
   by(f.x, f.y % 2)]

# Sum v by x
DT[, sum(v), by = x]

# Filter first, then group
DT[x != "a", sum(v), by = x]

# Row count per group
DT[, .N, by = x]

# First row per group
DT[, .SD[1], by = x]

# Row count + sum of all columns per group
DT[, c(.N, lapply(.SD, sum)), by = x]

# Expression in by
DT[, sum(v), by = .(y %% 2)]

# Row where v is minimum per group
DT[, .SD[which.min(v)], by = x]

# First 2 rows per group
DT[, head(.SD, 2), by = x]

# Last 2 rows per group
DT[, tail(.SD, 2), by = x]

# Multiple aggregations
DT[, list(MySum = sum(v),
          MyMin = min(v),
          MyMax = max(v)),
   by = .(x, y %% 2)]

In R’s data.table, the order of groups in the result is preserved from the source data. In Python datatable, results are sorted by the grouping column. Use keyby in R (or accept default behavior) to get a sorted result equivalent to datatable’s output.

Adding, Updating, and Deleting Columns

Python datatable
R data.table

# Add a new column
DT[:, update(z=42)]
DT["z"] = 42

# Add multiple columns
DT[:, update(sv=dt.sum(f.v), mv="X")]

# Delete a column
del DT["z"]
DT = DT[:, f[:].remove(f.z)]

# Update rows matching a condition
DT[f.x == "a", update(v=42)]
DT[f.x == "a", "v"] = 42

# Update a new column only for matching rows (NA elsewhere)
DT[f.x == "b", update(v2=84)]

# Add column computed per group
DT[:, update(m=dt.mean(f.v)), by("x")]

# Dynamic column name via variable
col = "rar"
DT[col] = 4242

# Add a new column
DT[, z := 42L]

# Add multiple columns
DT[, c("sv", "mv") := .(sum(v), "X")]

# Delete a column
DT[, z := NULL]

# Update rows matching a condition
DT["a", v := 42L, on = "x"]

# Update a new column only for matching rows (NA elsewhere)
DT["b", v2 := 84L, on = "x"]

# Add column computed per group
DT[, m := mean(v), by = x]

# Dynamic column name
col = "rar"
DT[, ..col := 4242]

update() and the del operator in datatable both operate in-place — no reassignment is needed. update() also preserves row order even inside a by(), which is useful for per-group transforms.

Joins

Python datatable
R data.table

DT = dt.Frame(x=["b"] * 3 + ["a"] * 3 + ["c"] * 3,
              y=[1, 3, 6] * 3,
              v=range(1, 10))
X = dt.Frame({"x": ("c", "b"), "v": (8, 7), "foo": (4, 2)})
X.key = "x"

# Left outer join
DT[:, :, join(X)]

# Inner join (filter out unmatched rows)
DT[g[-1] != None, :, join(X)]

# Not join (rows in DT with no match in X)
DT[g[-1] == None, f[:], join(X)]

# First row per match group
DT[g[-1] != None, :, join(X)][0, :, by("x")]

# Aggregate j from both frames
DT[:, dt.sum(f.v * g.foo), join(X), by(f.x)][f[-1] != 0, :]

DT = data.table(x = rep(c("b","a","c"), each=3),
                y = c(1, 3, 6), v = 1:9)
X = data.table(x = c("c","b"), v = 8:7, foo = c(4, 2))

# Left outer join
X[DT, on = "x"]

# Inner join
DT[X, on = "x", nomatch = NULL]

# Not join
DT[!X, on = "x"]

# First row per match group
DT[X, on = "x", mult = "first"]

# Aggregate j from both frames
DT[X, sum(v) * foo, by = .EACHI, on = "x"]

datatable currently supports left outer joins only. The joining frame must be keyed, key columns must be unique, and the join column must have the same name in both frames.

Key Differences to Keep in Mind

Concept	R data.table	Python datatable
Indexing base	1-based	0-based
Slice end	Inclusive (`2:5` = rows 2,3,4,5)	Exclusive (`2:5` = rows 2,3,4)
Row selection alone	`DT[y > 2]` works	Must include column selector: `DT[f.y > 2, :]`
Group order	Preserved from source	Sorted by grouping column
In-place update	`:=` operator	`update()` or direct assignment
Join type	Multiple join types	Left outer join only
`i` execution with `by`	`i` runs before grouping	`i` runs after grouping

Get Started

Core Concepts

Working with Data

Machine Learning

Migration & Comparisons

Datatable vs R data.table

Setup

Subsetting Rows (`i`)

Sorting

Selecting Columns (`j`)

Filtering and Aggregating Together

Grouping with `by`

Adding, Updating, and Deleting Columns

Joins

Key Differences to Keep in Mind

Build docs developers (and LLMs) love

Get Started

Core Concepts

Working with Data

Machine Learning

Migration & Comparisons

​Setup

​Subsetting Rows (i)

​Sorting

​Selecting Columns (j)

​Filtering and Aggregating Together

​Grouping with by

​Adding, Updating, and Deleting Columns

​Joins

​Key Differences to Keep in Mind

Build docs developers (and LLMs) love

Setup

Subsetting Rows (`i`)

Sorting

Selecting Columns (`j`)

Filtering and Aggregating Together

Grouping with `by`

Adding, Updating, and Deleting Columns

Joins

Key Differences to Keep in Mind