I/O Functions

fread

fread(
    anysource=None, *, file=None, text=None, cmd=None, url=None,
    columns=None, sep=None, dec=".", max_nrows=None, header=None,
    na_strings=None, verbose=False, fill=False, encoding=None,
    skip_to_string=None, skip_to_line=0, skip_blank_lines=False,
    strip_whitespace=True, quotechar='"', tempdir=None,
    nthreads=None, logger=None, multiple_sources="warn",
    memory_limit=None
)

Read data from a variety of input formats and produce a Frame as the result. Recognized formats include CSV, Jay, XLSX, and plain text. Data may also reside inside an archive such as .tar, .gz, .zip, .bz2, or .tgz.

Only one of anysource, file, text, cmd, or url may be specified at once.

Parameters

anysource

str | bytes | file | Pathlike | List

The primary input source. When unnamed, fread attempts to guess its type: file if a plain path, text if the string contains newlines, or url if the string starts with https://, s3://, or similar.

file

str | file | Pathlike

A file source — either a path on disk or a Python file-like object (any object with a .read() method). File-like objects are read in single-threaded mode. To address a file inside an archive or a sheet inside a workbook, write the path as if the archive were a folder: "data.zip/train.csv".

text

str | bytes

Provide data directly as an in-memory blob rather than reading from a file.

cmd

str

A shell command whose stdout is read as text. Useful for preprocessing large files before ingestion.

url

str

URL of the input file. Data is downloaded to a temporary directory, read, then cleaned up. Public S3 bucket paths are also supported and are internally converted to HTTPS URLs. Proxy, password, or cookie managers configured on urllib.request are respected.

sep

str | None

Field separator character. When None (default) the separator is auto-detected. Must be a single ASCII character; the characters ["'\0-9a-zA-Z]and non-ASCII characters are not allowed. Setsep=‘\n’` to read in single-column mode.

dec

"." | ","

default:"."

Decimal point symbol used for floating-point numbers.

max_nrows

int

Maximum number of rows to read. Any negative value means no limit.

header

bool | None

Whether the first line of the file contains column names. When None (default), fread heuristically determines this from the file contents. If the header row has one fewer column than the data, the first data column is assumed to contain row names and is named "index".

na_strings

List[str]

Strings in the input that should be interpreted as missing (NA) values.

fill

bool

default:"False"

When True, lines with fewer fields than other lines are padded with NA values rather than raising an error.

encoding

str | None

Re-encode the input from this encoding into UTF-8 before parsing. Any codec registered with Python’s codec module is accepted.

Skip all lines before the first line that contains this string, then start reading from that line. Cannot be combined with skip_to_line.

Number of lines to skip at the start of the file before parsing begins. Cannot be combined with skip_to_string.

skip_blank_lines

bool

default:"False"

When True, empty lines in the input are ignored. When False, empty lines raise an IOError (unless fill=True or single-column mode is active).

strip_whitespace

bool

default:"True"

Strip leading/trailing whitespace from unquoted string fields. Numeric fields always have whitespace stripped.

quotechar

string

default:"\""

The quote character used around fields in the CSV file.

nthreads

int | None

Number of threads to use. Cannot exceed dt.options.nthreads. A value of 0 or a negative number means that many threads fewer than the maximum. Defaults to all available threads.

verbose

bool

default:"False"

Print detailed internal progress information to stdout (or to logger if provided).

logger

object

Logger object to receive verbose progress information. Providing this parameter implicitly enables verbose mode.

multiple_sources

"warn" | "error" | "ignore"

default:"\"warn\""

Action when the input resolves to multiple sources. "warn" emits a warning and reads only the first. "ignore" silently reads only the first. "error" raises dt.exceptions.IOError. Use iread() to read all sources.

memory_limit

int

Advisory memory limit in bytes. When fread detects it needs more RAM than this limit, it streams intermediate data to a temporary binary file on disk. The resulting Frame may be partially on-disk; materialise or save it as Jay for best performance.

tempdir

str | None

Directory for temporary files. Defaults to the system temp directory as determined by Python’s tempfile module.

Returns

frame

Frame

A single Frame object is always returned, regardless of whether one or multiple sources were provided.

Examples

from datatable import dt, fread

fread("iris.csv")
#     | sepal_length  sepal_width  petal_length  petal_width  species
#     |      float64      float64       float64      float64  str32
# --- + ------------  -----------  ------------  -----------  -------
#   0 |          5.1          3.5           1.4          0.2  setosa
#   1 |          4.9          3             1.4          0.2  setosa
# [150 rows x 5 columns]

Specifying a separator

from datatable import fread

data = """
1:2:3:4
5:6:7:8
9:10:11:12
"""
fread(data, sep=":")
#    |    C0     C1     C2     C3
#    | int32  int32  int32  int32
# -- + -----  -----  -----  -----
#  0 |     1      2      3      4
#  1 |     5      6      7      8
#  2 |     9     10     11     12
# [3 rows x 4 columns]

Custom NA strings and fill

from datatable import fread

data = """
ID|Charges|Payment_Method
634-VHG|28|Cheque
365-DQC|33.5|Credit card
264-PPR|631|--
845-AJO|42.3|
"""
fread(data, na_strings=["--", ""])
#    | ID       Charges  Payment_Method
#    | str32    float64  str32
# -- + -------  -------  --------------
#  0 | 634-VHG     28    Cheque
#  1 | 365-DQC     33.5  Credit card
#  2 | 264-PPR    631    NA
#  3 | 845-AJO     42.3  NA
# [4 rows x 3 columns]

Column selection and renaming

from datatable import dt, fread

data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11,12"

# Select a subset of columns
fread(data, columns={"a", "b"})

# Rename some columns
fread(data, columns={"a": "A", "b": "B"})

# Drop columns by assigning None
fread(data, columns={"B": None, "D": None})

# Force all columns to float32
fread(data, columns=dt.float32)

# Filter columns dynamically with a callable
fread("iris.csv", columns=lambda cols: [
    col.name == "species" or "length" in col.name
    for col in cols
], max_nrows=5)

Skipping lines and limiting rows

from datatable import fread

data = ("skip this line\n"
        "a,b,c,d\n"
        "1,2,3,4\n"
        "5,6,7,8\n"
        "9,10,11,12")

fread(data, skip_to_line=2)
fread(data, skip_to_string="a,b", max_nrows=2)

Shell command input

from datatable import fread

# Filter a TSV file for 2015 rows using awk before reading
fread(cmd="""cat netflix.tsv | awk 'NR==1; /^2015-/'""")

iread

iread(
    anysource=None, *, file=None, text=None, cmd=None, url=None,
    columns=None, sep=None, dec=".", max_nrows=None, header=None,
    na_strings=None, verbose=False, fill=False, encoding=None,
    skip_to_string=None, skip_to_line=None, skip_blank_lines=False,
    strip_whitespace=True, quotechar='"',
    tempdir=None, nthreads=None, logger=None, errors="warn",
    memory_limit=None
)

Similar to fread(), but reads multiple sources at once and returns a lazy iterator of Frame objects. Use iread() when the input is a list of files, a glob pattern, a multi-file archive, or a multi-sheet XLSX workbook. All parsing parameters are identical to fread(). The only additional parameter is errors.

errors

"warn" | "raise" | "ignore" | "store"

default:"\"warn\""

Action to take when one of the input sources raises an error:

"warn" — convert the error to a warning, skip the source, and continue.
"raise" — raise the error immediately and stop iteration.
"ignore" — silently skip erroneous sources.
"store" — capture the exception and yield it as part of the iterator output, then continue with subsequent sources.

Returns

iterator

Iterator[Frame] | Iterator[Frame | Exception]

A lazy iterator that produces Frame objects, reading each source only when consumed. Each Frame carries a .source attribute describing where it was read from (file path, URL, archive entry, etc.).When errors="store", the iterator may yield either Frame objects or exception objects.

Example

from datatable import dt, iread

# Read all CSV files in an archive
for frame in iread("data.zip"):
    print(frame.source, frame.shape)

# Read all sheets in an Excel workbook
frames = list(iread("workbook.xlsx"))

# Read a list of files, storing errors instead of stopping
for result in iread(["a.csv", "missing.csv", "b.csv"], errors="store"):
    if isinstance(result, Exception):
        print("Error:", result)
    else:
        print(result.shape)

iread() is lazy — each Frame is read only after the previous one has been consumed. You can interrupt iteration early without reading all sources.

Core

Functions

Modules

fread

Parameters

Returns

Examples

Specifying a separator

Custom NA strings and fill

Column selection and renaming

Skipping lines and limiting rows

Shell command input

iread

Returns

Example

Build docs developers (and LLMs) love

Core

Functions

Modules

​fread

​Parameters

​Returns

​Examples

​Specifying a separator

​Custom NA strings and fill

​Column selection and renaming

​Skipping lines and limiting rows

​Shell command input

​iread

​Returns

​Example

Build docs developers (and LLMs) love

fread

Parameters

Returns

Examples

Specifying a separator

Custom NA strings and fill

Column selection and renaming

Skipping lines and limiting rows

Shell command input

iread

Returns

Example