fread
Frame as the result. Recognized formats include CSV, Jay, XLSX, and plain text. Data may also reside inside an archive such as .tar, .gz, .zip, .bz2, or .tgz.
Only one of
anysource, file, text, cmd, or url may be specified at once.Parameters
The primary input source. When unnamed, fread attempts to guess its type:
file if a plain path, text if the string contains newlines, or url if the string starts with https://, s3://, or similar.A file source — either a path on disk or a Python file-like object (any object with a
.read() method). File-like objects are read in single-threaded mode. To address a file inside an archive or a sheet inside a workbook, write the path as if the archive were a folder: "data.zip/train.csv".Provide data directly as an in-memory blob rather than reading from a file.
A shell command whose stdout is read as text. Useful for preprocessing large files before ingestion.
URL of the input file. Data is downloaded to a temporary directory, read, then cleaned up. Public S3 bucket paths are also supported and are internally converted to HTTPS URLs. Proxy, password, or cookie managers configured on
urllib.request are respected.Field separator character. When
None (default) the separator is auto-detected. Must be a single ASCII character; the characters ["'\0-9a-zA-Z]and non-ASCII characters are not allowed. Setsep=‘\n’` to read in single-column mode.Decimal point symbol used for floating-point numbers.
Maximum number of rows to read. Any negative value means no limit.
Whether the first line of the file contains column names. When
None (default), fread heuristically determines this from the file contents. If the header row has one fewer column than the data, the first data column is assumed to contain row names and is named "index".Strings in the input that should be interpreted as missing (
NA) values.When
True, lines with fewer fields than other lines are padded with NA values rather than raising an error.Re-encode the input from this encoding into UTF-8 before parsing. Any codec registered with Python’s
codec module is accepted.Skip all lines before the first line that contains this string, then start reading from that line. Cannot be combined with
skip_to_line.Number of lines to skip at the start of the file before parsing begins. Cannot be combined with
skip_to_string.When
True, empty lines in the input are ignored. When False, empty lines raise an IOError (unless fill=True or single-column mode is active).Strip leading/trailing whitespace from unquoted string fields. Numeric fields always have whitespace stripped.
The quote character used around fields in the CSV file.
Number of threads to use. Cannot exceed
dt.options.nthreads. A value of 0 or a negative number means that many threads fewer than the maximum. Defaults to all available threads.Print detailed internal progress information to stdout (or to
logger if provided).Logger object to receive verbose progress information. Providing this parameter implicitly enables verbose mode.
Action when the input resolves to multiple sources.
"warn" emits a warning and reads only the first. "ignore" silently reads only the first. "error" raises dt.exceptions.IOError. Use iread() to read all sources.Advisory memory limit in bytes. When fread detects it needs more RAM than this limit, it streams intermediate data to a temporary binary file on disk. The resulting Frame may be partially on-disk; materialise or save it as Jay for best performance.
Directory for temporary files. Defaults to the system temp directory as determined by Python’s
tempfile module.Returns
A single
Frame object is always returned, regardless of whether one or multiple sources were provided.Examples
Specifying a separator
Custom NA strings and fill
Column selection and renaming
Skipping lines and limiting rows
Shell command input
iread
fread(), but reads multiple sources at once and returns a lazy iterator of Frame objects. Use iread() when the input is a list of files, a glob pattern, a multi-file archive, or a multi-sheet XLSX workbook.
All parsing parameters are identical to fread(). The only additional parameter is errors.
Action to take when one of the input sources raises an error:
"warn"— convert the error to a warning, skip the source, and continue."raise"— raise the error immediately and stop iteration."ignore"— silently skip erroneous sources."store"— capture the exception and yield it as part of the iterator output, then continue with subsequent sources.
Returns
A lazy iterator that produces
Frame objects, reading each source only when consumed. Each Frame carries a .source attribute describing where it was read from (file path, URL, archive entry, etc.).When errors="store", the iterator may yield either Frame objects or exception objects.Example
iread() is lazy — each Frame is read only after the previous one has been consumed. You can interrupt iteration early without reading all sources.