datatable.str module provides functions that operate on string (str32 / str64) columns inside datatable expressions. Functions accept FExpr arguments and return FExpr results, composing naturally with f selectors and other datatable operations.
Functions in
dt.str operate lazily. They return an FExpr that is only evaluated when used inside a DT[rows, cols] expression.Functions
dt.str.len(column)
Compute the length (number of characters) of each string value in column.
Parameters
A string column expression.
FExpr[int64] — an integer column containing the character count for each row. NA strings produce NA output.
dt.str.slice(col, start, stop, step=1)
Apply the slice [start:stop:step] to each string value in col. You can also write f.col[start:stop:step] as a shorthand.
Parameters
The string column to slice.
Start index of the slice (inclusive). Negative indices count from the end.
None means the beginning of the string.Stop index of the slice (exclusive). Negative indices count from the end.
None means the end of the string.Step size. Defaults to
1.FExpr[str] — a string column with the sliced values.
dt.str.split_into_nhot(frame, sep=",", sort=False)
Split and n-hot encode a single-column string frame. Each value is split on sep, whitespace is trimmed, and the resulting labels become boolean columns in the output frame.
Parameters
A single-column frame with
str32 or str64 stype.Single-character delimiter to split on.
If
True, output columns are sorted alphabetically by label. Due to parallelization, column order is otherwise not guaranteed.Frame — one boolean column per unique label, one row per input row.
split_into_nhot operates on a full Frame (not a lazy FExpr). Pass the single-column frame directly to this function, not inside a DT[rows, cols] expression.