Frame is the primary data structure in datatable. It is two-dimensional (rows and columns) and column-oriented, meaning data is stored separately per column. Each column has its own name and type; types may differ across columns but are uniform within each column.
A Frame is optimized for the case where the number of rows significantly exceeds the number of columns.
Constructor
_data argument and keyword-column arguments **cols are mutually exclusive.
The primary data source. Accepts a wide range of Python types:
listof lists / Frames / numpy arrays — each element becomes one columnlistof dicts — each dict is a row; keys become column nameslistof tuples — each tuple is a rowlistof primitives — creates a single columndict— keys are column names, values are column datarange— creates a virtual integer column instantlyFrame— shallow copy of the given framestr— passed tofread()(file path, URL, or inline CSV text)pandas.DataFrame/pandas.Seriesnumpy.ndarraypyarrow.TableNone— creates an empty 0×0 frame
Keyword column initializers. Keys become column names, values are column data. Equivalent to passing a
dict as _data. Cannot be combined with an explicit _data argument or with names.Explicit column names. Length must match the number of columns. Incompatible with
**cols. Pass None for an individual element to auto-generate that column’s name.Explicit types for each column, as a list (positional) or dict (by name). Cannot be used together with
type.A single type applied to all columns. Cannot be used together with
types.Examples
Properties
Number of rows in the frame. This property is also settable: assigning a smaller value truncates the frame; assigning a larger value pads with NAs. Increasing
nrows on a keyed frame raises an error.Number of columns in the frame (read-only).
len(DT) also returns this value, but direct use of .ncols is preferred.A
(nrows, ncols) tuple. Read-only.Tuple of column names. Each name is a non-empty string with no ASCII control characters; names are unique within the frame.Assignable. You can rename all columns at once with a list, or rename a subset with a dict:
List of
dt.Type objects, one per column. Added in v1.0.0. Prefer this over .stypes and .ltypes.Tuple of
dt.stype values (storage types) for each column.Tuple of
dt.ltype values (logical types) for each column.Tuple of column names that form the primary key. Returns an empty tuple if no key is set.Assigning to this property sets the primary key: the specified column(s) are moved to the front, the frame is sorted, and values must be unique. Set to
None or use del DT.key to remove the key.The file path, URL, or command that this frame was loaded from. Returns
None if the frame was computed or modified. Read-only. Added in v0.11.Methods
Selection
head(n=10)
Return the first n rows of the frame. Equivalent to DT[:n, :].
Maximum number of rows to return. Must be non-negative.
Frame with up to n rows and the same columns.
tail(n=10)
Return the last n rows of the frame. Equivalent to DT[-n:, :] (except when n is 0).
Maximum number of rows to return. Must be non-negative.
Frame with up to n rows and the same columns.
copy(deep=False)
Make a copy of the frame, preserving column names, types, and key.
If
False (default), produces a shallow copy with copy-on-write semantics — data buffers are shared until modified. If True, produces a fully independent deep copy with all data physically written to new memory.Frame.
Export
to_csv(path=None, *, sep=",", quoting="minimal", append=False, header="auto", bom=False, hex=False, compression=None, verbose=False, method="auto")
Write the frame’s data to a CSV file. Uses multiple threads; thread count is controlled by dt.options.nthreads.
Output file path. If the file exists it is overwritten. If
None (default), returns the CSV text as a string (or bytes if compression is enabled).Field separator, must be a single character.
Quoting style.
"minimal" quotes only when necessary; "all" quotes every field; "nonnumeric" quotes all strings; "none" disables quoting entirely (may produce invalid CSV).If
True, opens the file in append mode instead of overwriting it.Whether to write the header row.
"auto" writes a header unless appending to an existing file.Insert a byte-order mark. Ignored when appending. Useful for Excel compatibility.
Write floating-point values in hexadecimal format (C
%a). Approximately 3× faster to write and read than decimal, at the cost of human readability.Output compression.
"auto" infers from file extension. Only "gzip" is currently supported. Cannot be combined with append=True.Disk-writing method.
"mmap" may be faster on some operating systems; "write" is more portable.None when writing to a file, or a str (or bytes if compressed) when path is empty.
to_jay(path=None, method="auto")
Save the frame to a binary .jay file. Jay is a datatable-native format that supports memory-mapping.
Destination file path. If
None, serializes the frame into memory and returns a bytes object.Writing method. Has no effect when
path is None.None when writing to a file, or bytes when path is None.
to_pandas()
Convert the frame to a pandas.DataFrame. Key columns become the pandas index.
Returns a pandas.DataFrame. Raises ImportError if pandas is not installed.
to_numpy(type=None, column=None, c_contiguous=False)
Convert the frame to a 2D numpy array.
When the frame has a single non-virtual, non-string column and no type override, the returned array shares memory with the frame (no copy). Otherwise data is copied. Frames with NA values return a numpy.ma.MaskedArray.
Cast all data to this type before converting.
Convert a single column (by index, negative indices supported). Returns a 1D array.
If
True, returns a row-major (C-contiguous) array. Default is column-major.numpy.ndarray or numpy.ma.MaskedArray. Raises ImportError if numpy is not installed.
to_dict()
Convert the frame to a dict of lists, keyed by column name. Column order is preserved.
Returns Dict[str, List].
to_list()
Convert the frame to a list of lists, by columns. Each inner list is one column.
Returns List[List] of length ncols.
to_tuples()
Convert the frame to a list of tuples, by rows. Each tuple contains one row of data.
Returns List[Tuple] of length nrows.
to_arrow()
Convert the frame to a pyarrow.Table. The conversion is multi-threaded and involves copying the data (except when data was originally imported from Arrow).
Returns pyarrow.Table. Raises ImportError if pyarrow is not installed.
Aggregation
All aggregation methods return a new single-row Frame with the same column names. Each method also has amethod1() variant that operates on a single-column frame and returns a scalar.
sum()
Sum of all values in each column. Integer and boolean columns return int64; float32 columns return float32; float64 columns return float64. Non-numeric columns return NA as float64.
min() / max()
Minimum / maximum value per column. Returns a one-row Frame of the same types as the source columns.
mean()
Mean per column. All columns return float64. String and object columns return NA.
sd()
Standard deviation per column. Returns float64 for all columns.
countna()
Count of NA (missing) values per column. All result columns are int64.
nunique()
Number of unique values per column.
mode()
Most frequently occurring value per column.
kurt()
Excess kurtosis per column (Fisher’s definition, normal = 0).
skew()
Skewness per column.
Modification
cbind(*frames, force=False)
Append columns from one or more frames to this frame, in-place.
All frames must have the same number of rows, unless they have exactly one row (which is replicated) or force=True. Column name conflicts are resolved by automatic name mangling.
One or more frames to append.
None values are ignored.If
True, frames with mismatched row counts are allowed; shorter frames are padded with NAs (single-row frames are replicated instead).None (modifies in-place).
rbind(*frames, force=False, bynames=True)
Append rows from one or more frames to this frame, in-place. Equivalent to list.extend() — rows are stacked vertically.
Column types are promoted as needed (bool → int → float). Incompatible types raise TypeError unless force=True.
One or more frames to append. They should have the same column structure unless
force=True.If
True, mismatched column sets are allowed (gaps filled with NA). Incompatible types are coerced to string.If
True (default), columns are matched by name. If False, columns are matched positionally.None (modifies in-place).
sort(*cols)
Sort the frame by one or more columns. Returns a new sorted frame; the original is not modified.
Names or indices of columns to sort by. If no columns are given, all columns are used.
Frame.
replace(replace_what, replace_with)
Search and replace values throughout the entire frame, in-place. Each value is replaced only in columns of a compatible type (None matches any column).
The operation never changes a column’s logical type, but may change its storage type if the replacement value requires a wider representation.
Value(s) to find. If a
dict, it maps search values to their replacements and replace_with must be omitted.Replacement value(s). Must be a single value if
replace_what is a single value, or a list of the same length if replace_what is a list.None (modifies in-place).
materialize(to_memory=False)
Force all virtual (lazily computed) columns to be physically written to memory.
Useful when you want to break internal references to a larger source frame so that the original can be garbage-collected, or when you want to ensure all delayed computations are resolved.
If
True, also copies memory-mapped columns (e.g., those opened from a Jay file on disk) into RAM.None (modifies in-place).
Utility
export_names()
Return a tuple of f-expressions for all columns, in order. Assigning these to variables lets you reference columns without the f. prefix.
Tuple[FExpr, ...].
colindex(column)
Return the integer index (0-based) of a column.
Column name, index (negative counts from the end), or f-expression (
f.A, f[3]).int. Raises KeyError for an unknown name, or IndexError for an out-of-range index.