Frame - Datatable

Frame is the primary data structure in datatable. It is two-dimensional (rows and columns) and column-oriented, meaning data is stored separately per column. Each column has its own name and type; types may differ across columns but are uniform within each column. A Frame is optimized for the case where the number of rows significantly exceeds the number of columns.

import datatable as dt

DT = dt.Frame(A=range(5), B=[1.1, 2.2, 3.3, 4.4, 5.5])

A Frame can be iterated as a sequence of single-column Frames (column-wise), or unpacked as a dictionary:

for col in DT:
    print(col)          # each col is a 1-column Frame

d = {**DT}              # dict of {name: 1-col Frame}
cols = [*DT]            # list of 1-column Frames

Constructor

dt.Frame(_data=None, *, names=None, types=None, type=None, **cols)

Create a new Frame from a single or multiple sources. The _data argument and keyword-column arguments **cols are mutually exclusive.

_data

Any

The primary data source. Accepts a wide range of Python types:

list of lists / Frames / numpy arrays — each element becomes one column
list of dicts — each dict is a row; keys become column names
list of tuples — each tuple is a row
list of primitives — creates a single column
dict — keys are column names, values are column data
range — creates a virtual integer column instantly
Frame — shallow copy of the given frame
str — passed to fread() (file path, URL, or inline CSV text)
pandas.DataFrame / pandas.Series
numpy.ndarray
pyarrow.Table
None — creates an empty 0×0 frame

**cols

Any

Keyword column initializers. Keys become column names, values are column data. Equivalent to passing a dict as _data. Cannot be combined with an explicit _data argument or with names.

names

List[str | None]

Explicit column names. Length must match the number of columns. Incompatible with **cols. Pass None for an individual element to auto-generate that column’s name.

types

List[Type] | Dict[str, Type]

Explicit types for each column, as a list (positional) or dict (by name). Cannot be used together with type.

type

Type | type

A single type applied to all columns. Cannot be used together with types.

Examples

# From keyword arguments
DT = dt.Frame(
    A=range(7),
    B=[0.1, 0.3, 0.5, 0.7, None, 1.0, 1.5],
    C=["red", "orange", "yellow", "green", "blue", "indigo", "violet"]
)

# From a list of dicts (row-oriented)
DT = dt.Frame([
    {"A": 3, "B": 7},
    {"A": 0, "B": 11, "C": -1},
    {"C": 5}
])

# From a list of tuples
DT = dt.Frame([(39, "Mary"), (17, "Jasmine"), (23, "Lily")],
              names=["age", "name"])

# Force all columns to int32
DT = dt.Frame(A=[1, 2, 3], B=[4, 5, 6], type=dt.Type.int32)

# From a CSV file
DT = dt.Frame("train.csv")

Properties

nrows

int

Number of rows in the frame. This property is also settable: assigning a smaller value truncates the frame; assigning a larger value pads with NAs. Increasing nrows on a keyed frame raises an error.

DT.nrows         # read
DT.nrows = 100   # truncate or extend

ncols

int

Number of columns in the frame (read-only). len(DT) also returns this value, but direct use of .ncols is preferred.

shape

Tuple[int, int]

A (nrows, ncols) tuple. Read-only.

DT.shape  # e.g. (1000, 5)

names

Tuple[str, ...]

Tuple of column names. Each name is a non-empty string with no ASCII control characters; names are unique within the frame.Assignable. You can rename all columns at once with a list, or rename a subset with a dict:

DT.names = ["x", "y", "z"]          # rename all
DT.names = {"old_name": "new_name"} # rename one
del DT.names                         # reset to C0, C1, ...

types

List[Type]

List of dt.Type objects, one per column. Added in v1.0.0. Prefer this over .stypes and .ltypes.

stypes

Tuple[stype, ...]

Tuple of dt.stype values (storage types) for each column.

Deprecated since v1.0.0. Will be removed in v1.2.0. Use .types instead.

ltypes

Tuple[ltype, ...]

Tuple of dt.ltype values (logical types) for each column.

Deprecated since v1.0.0. Will be removed in v1.2.0. Use .types instead.

key

Tuple[str, ...]

Tuple of column names that form the primary key. Returns an empty tuple if no key is set.Assigning to this property sets the primary key: the specified column(s) are moved to the front, the frame is sorted, and values must be unique. Set to None or use del DT.key to remove the key.

DT.key = "id"               # single-column key
DT.key = ["year", "month"]  # composite key
del DT.key                  # remove key

source

str | None

The file path, URL, or command that this frame was loaded from. Returns None if the frame was computed or modified. Read-only. Added in v0.11.

Methods

Selection

`head(n=10)`

Return the first n rows of the frame. Equivalent to DT[:n, :].

int

default:"10"

Maximum number of rows to return. Must be non-negative.

Returns a new Frame with up to n rows and the same columns.

DT.head()     # first 10 rows
DT.head(3)    # first 3 rows

`tail(n=10)`

Return the last n rows of the frame. Equivalent to DT[-n:, :] (except when n is 0).

int

default:"10"

Maximum number of rows to return. Must be non-negative.

Returns a new Frame with up to n rows and the same columns.

DT.tail()     # last 10 rows
DT.tail(3)    # last 3 rows

`copy(deep=False)`

Make a copy of the frame, preserving column names, types, and key.

deep

bool

default:"false"

If False (default), produces a shallow copy with copy-on-write semantics — data buffers are shared until modified. If True, produces a fully independent deep copy with all data physically written to new memory.

Returns a new Frame.

DT2 = DT.copy()          # shallow (fast)
DT3 = DT.copy(deep=True) # deep (independent)

# Also available via the standard library:
import copy
DT_shallow = copy.copy(DT)
DT_deep    = copy.deepcopy(DT)

Export

`to_csv(path=None, *, sep=",", quoting="minimal", append=False, header="auto", bom=False, hex=False, compression=None, verbose=False, method="auto")`

Write the frame’s data to a CSV file. Uses multiple threads; thread count is controlled by dt.options.nthreads.

path

str | None

Output file path. If the file exists it is overwritten. If None (default), returns the CSV text as a string (or bytes if compression is enabled).

sep

str

default:","

Field separator, must be a single character.

quoting

"minimal" | "all" | "nonnumeric" | "none" | csv.QUOTE_*

default:"\"minimal\""

Quoting style. "minimal" quotes only when necessary; "all" quotes every field; "nonnumeric" quotes all strings; "none" disables quoting entirely (may produce invalid CSV).

append

bool

default:"false"

If True, opens the file in append mode instead of overwriting it.

header

bool | "auto"

default:"\"auto\""

Whether to write the header row. "auto" writes a header unless appending to an existing file.

bom

bool

default:"false"

Insert a byte-order mark. Ignored when appending. Useful for Excel compatibility.

hex

bool

default:"false"

Write floating-point values in hexadecimal format (C %a). Approximately 3× faster to write and read than decimal, at the cost of human readability.

compression

"gzip" | "auto" | None

default:"None"

Output compression. "auto" infers from file extension. Only "gzip" is currently supported. Cannot be combined with append=True.

method

"mmap" | "write" | "auto"

default:"\"auto\""

Disk-writing method. "mmap" may be faster on some operating systems; "write" is more portable.

Returns None when writing to a file, or a str (or bytes if compressed) when path is empty.

DT.to_csv("output.csv")               # write to file
csv_text = DT.to_csv()                # return as string
DT.to_csv("output.csv.gz",
          compression="gzip")         # gzip-compressed
DT.to_csv("log.csv", append=True,
          header=False)               # append without header

`to_jay(path=None, method="auto")`

Save the frame to a binary .jay file. Jay is a datatable-native format that supports memory-mapping.

path

str | None

Destination file path. If None, serializes the frame into memory and returns a bytes object.

method

"mmap" | "write" | "auto"

default:"\"auto\""

Writing method. Has no effect when path is None.

Returns None when writing to a file, or bytes when path is None.

DT.to_jay("data.jay")        # write to disk
blob = DT.to_jay()           # serialize to bytes in memory

`to_pandas()`

Convert the frame to a pandas.DataFrame. Key columns become the pandas index. Returns a pandas.DataFrame. Raises ImportError if pandas is not installed.

df = DT.to_pandas()

`to_numpy(type=None, column=None, c_contiguous=False)`

Convert the frame to a 2D numpy array. When the frame has a single non-virtual, non-string column and no type override, the returned array shares memory with the frame (no copy). Otherwise data is copied. Frames with NA values return a numpy.ma.MaskedArray.

type

Type | type-like

Cast all data to this type before converting.

column

int

Convert a single column (by index, negative indices supported). Returns a 1D array.

c_contiguous

bool

default:"false"

If True, returns a row-major (C-contiguous) array. Default is column-major.

Returns numpy.ndarray or numpy.ma.MaskedArray. Raises ImportError if numpy is not installed.

arr = DT.to_numpy()
arr = DT.to_numpy(type=dt.float32)
col = DT.to_numpy(column=0)   # 1D array

`to_dict()`

Convert the frame to a dict of lists, keyed by column name. Column order is preserved. Returns Dict[str, List].

DT.to_dict()
# {"A": [1, 2, 3], "B": ["aye", "nay", "tain"]}

`to_list()`

Convert the frame to a list of lists, by columns. Each inner list is one column. Returns List[List] of length ncols.

DT.to_list()
# [[1, 2, 3], ["aye", "nay", "tain"]]

`to_tuples()`

Convert the frame to a list of tuples, by rows. Each tuple contains one row of data. Returns List[Tuple] of length nrows.

DT.to_tuples()
# [(1, "aye"), (2, "nay"), (3, "tain")]

`to_arrow()`

Convert the frame to a pyarrow.Table. The conversion is multi-threaded and involves copying the data (except when data was originally imported from Arrow). Returns pyarrow.Table. Raises ImportError if pyarrow is not installed.

arrow_table = DT.to_arrow()

Aggregation

All aggregation methods return a new single-row Frame with the same column names. Each method also has a method1() variant that operates on a single-column frame and returns a scalar.

`sum()`

Sum of all values in each column. Integer and boolean columns return int64; float32 columns return float32; float64 columns return float64. Non-numeric columns return NA as float64.

DT.sum()
value = DT[:, "A"].sum1()  # scalar for one column

`min()` / `max()`

Minimum / maximum value per column. Returns a one-row Frame of the same types as the source columns.

DT.min()
DT.max()

`mean()`

Mean per column. All columns return float64. String and object columns return NA.

DT.mean()

`sd()`

Standard deviation per column. Returns float64 for all columns.

DT.sd()

`countna()`

Count of NA (missing) values per column. All result columns are int64.

DT.countna()

`nunique()`

Number of unique values per column.

DT.nunique()

`mode()`

Most frequently occurring value per column.

DT.mode()

`kurt()`

Excess kurtosis per column (Fisher’s definition, normal = 0).

DT.kurt()

`skew()`

Skewness per column.

DT.skew()

Modification

`cbind(*frames, force=False)`

Append columns from one or more frames to this frame, in-place. All frames must have the same number of rows, unless they have exactly one row (which is replicated) or force=True. Column name conflicts are resolved by automatic name mangling.

This method modifies the frame in-place. Calling it on a temporary expression like DT[:, :].cbind(other) has no effect on DT.

frames

Frame | List[Frame] | None

One or more frames to append. None values are ignored.

force

bool

default:"false"

If True, frames with mismatched row counts are allowed; shorter frames are padded with NAs (single-row frames are replicated instead).

Returns None (modifies in-place).

DT0 = dt.Frame(A=[1, 2, 3], B=[4, 7, 0])
DT1 = dt.Frame(N=[-1, -2, -5])
DT0.cbind(DT1)
# DT0 now has columns A, B, N

# Cbind multiple at once (more efficient than one-by-one)
DT0.cbind(DT1, DT2, DT3)

`rbind(*frames, force=False, bynames=True)`

Append rows from one or more frames to this frame, in-place. Equivalent to list.extend() — rows are stacked vertically. Column types are promoted as needed (bool → int → float). Incompatible types raise TypeError unless force=True.

frames

Frame | List[Frame]

One or more frames to append. They should have the same column structure unless force=True.

force

bool

default:"false"

If True, mismatched column sets are allowed (gaps filled with NA). Incompatible types are coerced to string.

bynames

bool

default:"true"

If True (default), columns are matched by name. If False, columns are matched positionally.

Returns None (modifies in-place).

DT.rbind(DT_extra)
DT.rbind(DT1, DT2, force=True)  # mismatched columns OK

`sort(*cols)`

Sort the frame by one or more columns. Returns a new sorted frame; the original is not modified.

cols

str | int

Names or indices of columns to sort by. If no columns are given, all columns are used.

Returns a new Frame.

DT.sort("age")            # sort by column name
DT.sort("year", "month")  # sort by multiple columns
DT.sort(0)                # sort by first column

`replace(replace_what, replace_with)`

Search and replace values throughout the entire frame, in-place. Each value is replaced only in columns of a compatible type (None matches any column). The operation never changes a column’s logical type, but may change its storage type if the replacement value requires a wider representation.

replace_what

Value(s) to find. If a dict, it maps search values to their replacements and replace_with must be omitted.

replace_with

Replacement value(s). Must be a single value if replace_what is a single value, or a list of the same length if replace_what is a list.

Returns None (modifies in-place).

DT.replace(0, -1)                          # replace integer
DT.replace(None, 0)                        # fill NAs with 0
DT.replace([-1, None, "?"], [0, 0, "NA"])  # multiple replacements
DT.replace({-1: 0, "bad": "N/A"})         # using a dict

`materialize(to_memory=False)`

Force all virtual (lazily computed) columns to be physically written to memory. Useful when you want to break internal references to a larger source frame so that the original can be garbage-collected, or when you want to ensure all delayed computations are resolved.

to_memory

bool

default:"false"

If True, also copies memory-mapped columns (e.g., those opened from a Jay file on disk) into RAM.

Returns None (modifies in-place).

DT.materialize()
DT.materialize(to_memory=True)  # force everything into RAM

Utility

`export_names()`

Return a tuple of f-expressions for all columns, in order. Assigning these to variables lets you reference columns without the f. prefix.

A, B, C = DT.export_names()
result = DT[A + B > C, :]

# Export a subset
A, B = DT[:, :2].export_names()
A, B, *_ = DT.export_names()

The exported expressions refer to columns by name and work with any frame that has matching column names. Returns Tuple[FExpr, ...].

`colindex(column)`

Return the integer index (0-based) of a column.

column

str | int | FExpr

Column name, index (negative counts from the end), or f-expression (f.A, f[3]).

Returns int. Raises KeyError for an unknown name, or IndexError for an out-of-range index.

DT.colindex("B")    # 1
DT.colindex(-1)     # last column index
DT.colindex(f.A)    # same as colindex("A")

`view(interactive=None, plain=False)`

Display the frame in an interactive viewer.

This method is currently not working properly. See issue #2669.

Core

Functions

Modules

​Constructor

​Examples

​Properties

​Methods

​Selection

​head(n=10)

​tail(n=10)

​copy(deep=False)

​Export

​to_csv(path=None, *, sep=",", quoting="minimal", append=False, header="auto", bom=False, hex=False, compression=None, verbose=False, method="auto")

​to_jay(path=None, method="auto")

​to_pandas()

​to_numpy(type=None, column=None, c_contiguous=False)

​to_dict()

​to_list()

​to_tuples()

​to_arrow()

​Aggregation

​sum()

​min() / max()

​mean()

​sd()

​countna()

​nunique()

​mode()

​kurt()

​skew()

​Modification

​cbind(*frames, force=False)

​rbind(*frames, force=False, bynames=True)

​sort(*cols)

​replace(replace_what, replace_with)

​materialize(to_memory=False)

​Utility

​export_names()

​colindex(column)

​view(interactive=None, plain=False)

Build docs developers (and LLMs) love

Constructor

Examples

Properties

Methods

Selection

`head(n=10)`

`tail(n=10)`

`copy(deep=False)`

Export

`to_csv(path=None, *, sep=",", quoting="minimal", append=False, header="auto", bom=False, hex=False, compression=None, verbose=False, method="auto")`

`to_jay(path=None, method="auto")`

`to_pandas()`

`to_numpy(type=None, column=None, c_contiguous=False)`

`to_dict()`

`to_list()`

`to_tuples()`

`to_arrow()`

Aggregation

`sum()`

`min()` / `max()`

`mean()`

`sd()`

`countna()`

`nunique()`

`mode()`

`kurt()`

`skew()`

Modification

`cbind(*frames, force=False)`

`rbind(*frames, force=False, bynames=True)`

`sort(*cols)`

`replace(replace_what, replace_with)`

`materialize(to_memory=False)`

Utility

`export_names()`

`colindex(column)`

`view(interactive=None, plain=False)`