Skip to main content
datatable provides a suite of functions for treating single-column Frames as sets, as well as utilities for binding multiple frames together by rows or columns.
The set operations (union, intersect, setdiff, symdiff, unique) require each input frame to have exactly one column. Passing a multi-column frame raises ValueError. Columns of type obj64 are not supported.

union

union(*frames)
Find the union of values across all frames. Returns every distinct value that appears in at least one frame. Equivalent to dt.unique(dt.rbind(*frames)).
*frames
Frame, Frame, ...
required
Input single-column frames. Empty frames are accepted.

Returns

frame
Frame
A single-column frame of unique values. The column type is the smallest common stype of all input columns. Result is sorted.

Example

from datatable import dt

df = dt.Frame({"A": [1, 1, 2, 1, 2],
               "B": [None, 2, 3, 4, 5],
               "C": [1, 2, 1, 1, 2]})

# Union all three columns (each column is one "set")
dt.union(*df)
#    |     A
#    | int32
# -- + -----
#  0 |    NA
#  1 |     1
#  2 |     2
#  3 |     3
#  4 |     4
#  5 |     5

# Union of two specific columns
dt.union(df["A"], df["C"])
#    |     A
#    | int32
# -- + -----
#  0 |     1
#  1 |     2

intersect

intersect(*frames)
Find the intersection of values across all frames. Returns values that are present in every frame.
*frames
Frame, Frame, ...
required
Input single-column frames. Empty frames are accepted.

Returns

frame
Frame
A single-column frame of values common to all inputs. The column type is the smallest common stype of all input columns.

Example

from datatable import dt

s1 = dt.Frame([4, 5, 6, 20, 42])
s2 = dt.Frame([1, 2, 3, 5, 42])

dt.intersect(s1, s2)
#    |    C0
#    | int32
# -- + -----
#  0 |     5
#  1 |    42

setdiff

setdiff(frame0, *frames)
Find the set difference between frame0 and the other frames. Returns values that are present in frame0 but not in any of the other frames.
frame0
Frame
required
The base single-column frame.
*frames
Frame, Frame, ...
required
One or more single-column frames to subtract from frame0.

Returns

frame
Frame
A single-column frame containing values from frame0 that do not appear in any other input frame. The column type is the smallest common stype of all input columns.

Example

from datatable import dt

s1 = dt.Frame([4, 5, 6, 20, 42])
s2 = dt.Frame([1, 2, 3, 5, 42])

dt.setdiff(s1, s2)
#    |    C0
#    | int32
# -- + -----
#  0 |     4
#  1 |     6
#  2 |    20

symdiff

symdiff(*frames)
Find the symmetric difference of values across all frames. For two frames this is values that appear in either frame but not both. For more than two frames, values that appear in an odd number of frames are returned.
*frames
Frame, Frame, ...
required
Input single-column frames. Empty frames are accepted.

Returns

frame
Frame
A single-column frame. The column type is the smallest common stype of all input columns.

Example

from datatable import dt

df = dt.Frame({"A": [1, 1, 2, 1, 2],
               "B": [None, 2, 3, 4, 5],
               "C": [1, 2, 1, 1, 2]})

# Symmetric difference of two columns
dt.symdiff(df["A"], df["B"])
#    |     A
#    | int32
# -- + -----
#  0 |    NA
#  1 |     1
#  2 |     3
#  3 |     4
#  4 |     5

# Symmetric difference across all three columns
dt.symdiff(*df)
#    |     A
#    | int32
# -- + -----
#  0 |    NA
#  1 |     2
#  2 |     3
#  3 |     4
#  4 |     5

unique

unique(frame)
Find all unique values across every column in frame. Values are sorted (using sort-based deduplication; order may change in a future release).
frame
Frame
required
Input frame. May have any number of columns; all values across all columns are pooled together.

Returns

frame
Frame
A single-column frame of distinct values. The column type is the smallest common stype for all columns in the input frame. Raises NotImplementedError for obj64 columns.

Example

from datatable import dt

df = dt.Frame({"A": [1, 1, 2, 1, 2],
               "B": [None, 2, 3, 4, 5],
               "C": [1, 2, 1, 1, 2]})

# Unique values across the entire frame
dt.unique(df)
#    |    C0
#    | int32
# -- + -----
#  0 |    NA
#  1 |     1
#  2 |     2
#  3 |     3
#  4 |     4
#  5 |     5

# Unique values in a single column
dt.unique(df["A"])
#    |     A
#    | int32
# -- + -----
#  0 |     1
#  1 |     2

rbind

rbind(*frames, force=False, bynames=True)
Produce a new frame by appending rows from several frames (vertical concatenation).
*frames
Frame | List[Frame] | None
required
Frames to stack vertically.
force
bool
default:"False"
When True, frames with mismatching columns (different counts or names) are accepted. Missing cells are filled with NA. Columns with unrelated types are converted to strings.
bynames
bool
default:"True"
Match columns by name when True. When False, columns are matched by position instead.

Returns

frame
Frame
A new frame whose rows are the rows of all input frames concatenated in order.

Example

from datatable import dt

DT1 = dt.Frame({"Weight": [5, 4, 6], "Height": [170, 172, 180]})
DT2 = dt.Frame({"Height": [180, 181, 169], "Weight": [4, 4, 5]})

dt.rbind(DT1, DT2)
#    | Weight  Height
#    |  int32   int32
# -- + ------  ------
#  0 |      5     170
#  1 |      4     172
#  2 |      6     180
#  3 |      4     180
#  4 |      4     181
#  5 |      5     169
# [6 rows x 2 columns]

cbind

cbind(*frames, force=False)
Create a new frame by appending columns from several frames (horizontal concatenation). Returns a new Frame; the input frames are not modified.
*frames
Frame | List[Frame] | None
required
Frames to concatenate column-wise. None values are silently skipped.
force
bool
default:"False"
When True, frames with unequal row counts are accepted. The result has as many rows as the largest input frame. Shorter frames are padded with NA (frames with exactly 1 row are replicated instead).

Returns

frame
Frame
A new frame whose columns are the columns of all input frames placed side by side.

Example

from datatable import dt

DT = dt.Frame(A=[1, 2, 3], B=[4, 7, 0])
frame1 = dt.Frame(N=[-1, -2, -5])

dt.cbind([DT, frame1])
#    |     A      B      N
#    | int32  int32  int32
# -- + -----  -----  -----
#  0 |     1      4     -1
#  1 |     2      7     -2
#  2 |     3      0     -5

Set operations quick reference

FunctionDescriptionInput
union(*frames)All distinct values from any frame1-column frames
intersect(*frames)Values present in every frame1-column frames
setdiff(frame0, *frames)Values in frame0 not in any other1-column frames
symdiff(*frames)Values in an odd number of frames1-column frames
unique(frame)Distinct values across all columnsAny frame
rbind(*frames)Stack frames verticallyAny frames
cbind(*frames)Stack frames horizontallyAny frames

Build docs developers (and LLMs) love