Skip to main content
datatable provides fread() for fast, multi-threaded reading of tabular data from almost any source, and several to_* methods for exporting frames.
from datatable import dt, fread

Reading Data with fread()

fread() auto-detects format, delimiter, header, and column types. It supports CSV, TSV, Jay, XLSX, and plain text, including data embedded in archives.

From a file

fread('iris.csv')

From a string (inline text)

data = ('col1,col2,col3\n'
        'a,b,1\n'
        'a,b,2\n'
        'c,d,3')
fread(data)

From a URL

url = "https://raw.githubusercontent.com/Rdatatable/data.table/master/vignettes/flights14.csv"
fread(url)

From an archive

fread("data.zip/mtcars.csv")      # target a specific file inside the zip

From an Excel file

fread("excel.xlsx")               # first sheet
fread("excel.xlsx/Sheet1")        # specific sheet
xlrd must be installed to read .xls / .xlsx files. Use iread() to read multiple sheets at once — it returns an iterator of frames.

From the command line

fread(cmd="""cat netflix.tsv | awk 'NR==1; /^2015-/'""")

Key fread() Parameters

Delimiter (sep)

fread auto-detects the separator, but you can override it:
data = "1:2:3:4\n5:6:7:8\n9:10:11:12"
fread(data, sep=":")
The separator must be a single ASCII character; alphanumerics and quotes are not allowed.

Header (header)

fread(data, header=False)            # treat first row as data, not header
fread(data, header=False, columns=["A","B","C","D"])

Custom column names and types (columns)

fread(data, columns=["A", "B", "C", "D"])

Selecting a subset of columns

fread(data, columns={"a", "b"})

Null values (na_strings)

fread(data, na_strings=['--', '', 'N/A'])

Skip blank lines

fread(data, skip_blank_lines=True)

Skip to a line or string

fread(data, skip_to_line=2)           # skip the first line
fread(data, skip_to_string='first')   # start from the line containing "first"
These two parameters cannot be combined.

Limit rows

fread(data, max_nrows=1000)

Fill ragged rows

# Rows with fewer fields than others are padded with NA
fread(data, fill=True)

Thread count

fread('large.csv', nthreads=4)
# default: all available threads (dt.options.nthreads)

Reading Multiple Files with iread()

iread() returns an iterator of frames — useful for archives with multiple files, Excel workbooks with multiple sheets, or a list of paths:
from datatable import iread

for frame in iread("data.zip"):
    print(frame.shape)
iread() is the correct choice when a zip archive contains multiple CSV files or an Excel workbook has multiple sheets. fread() with a multi-source input will warn and only read the first source.

Writing Data

CSV — DT.to_csv()

DT.to_csv("output.csv")
Key parameters:
ParameterDefaultDescription
pathNoneFile path. If None, returns a string.
sep","Field separator (single character).
quoting"minimal"When to quote fields: "minimal", "all", "nonnumeric", "none".
appendFalseAppend to existing file instead of overwriting.
header"auto"Write column names. Auto-suppressed when appending.
compressionNone"gzip" or "auto" (infers from file extension).
bomFalseWrite UTF-8 byte-order mark (useful for Excel compatibility).

Jay binary format — DT.to_jay()

.jay is datatable’s native binary format. It preserves all type information and can be read back by fread() with zero parsing overhead.
DT.to_jay("data.jay")
Use .jay files for caching intermediate results or sharing frames between datatable processes. Read/write speeds are significantly faster than CSV because no parsing is needed.

pandas — DT.to_pandas()

pdf = DT.to_pandas()
# Returns a pandas.DataFrame
# If DT has key columns, those become the DataFrame index
pandas must be installed. If the frame has key columns set, they will become the index columns in the resulting DataFrame.

NumPy — DT.to_numpy()

arr = DT.to_numpy()
# Returns a numpy.ndarray of shape (nrows, ncols)

Dict — DT.to_dict()

d = DT.to_dict()
# Returns {column_name: [values, ...], ...}

Thousand Separator Detection

fread automatically detects commas used as thousand separators (assumes ,):
fread("""Name|Salary|Position
         James|256,000|evangelist
         Ragnar|1,000,000|conqueror
         Loki|250360|trickster""")
# Salary column is read as int32: 256000, 1000000, 250360

Build docs developers (and LLMs) love