Reading and Writing Data

datatable provides fread() for fast, multi-threaded reading of tabular data from almost any source, and several to_* methods for exporting frames.

from datatable import dt, fread

Reading Data with `fread()`

fread() auto-detects format, delimiter, header, and column types. It supports CSV, TSV, Jay, XLSX, and plain text, including data embedded in archives.

From a file

fread('iris.csv')

From a string (inline text)

data = ('col1,col2,col3\n'
        'a,b,1\n'
        'a,b,2\n'
        'c,d,3')
fread(data)

From a URL

url = "https://raw.githubusercontent.com/Rdatatable/data.table/master/vignettes/flights14.csv"
fread(url)

From an archive

fread("data.zip/mtcars.csv")      # target a specific file inside the zip

From an Excel file

fread("excel.xlsx")               # first sheet
fread("excel.xlsx/Sheet1")        # specific sheet

xlrd must be installed to read .xls / .xlsx files. Use iread() to read multiple sheets at once — it returns an iterator of frames.

From the command line

fread(cmd="""cat netflix.tsv | awk 'NR==1; /^2015-/'""")

Key `fread()` Parameters

Delimiter (`sep`)

fread auto-detects the separator, but you can override it:

data = "1:2:3:4\n5:6:7:8\n9:10:11:12"
fread(data, sep=":")

The separator must be a single ASCII character; alphanumerics and quotes are not allowed.

Header (`header`)

fread(data, header=False)            # treat first row as data, not header
fread(data, header=False, columns=["A","B","C","D"])

Custom column names and types (`columns`)

fread(data, columns=["A", "B", "C", "D"])

fread(data, columns={"a": "A", "b": "B"})

fread(data, columns={"a": dt.float32, "b": dt.str32})

# None means skip (exclude) that column
fread(data, columns=[dt.int32, dt.str32, None, dt.float32])

fread(data, columns=dt.float32)

def colfilter(columns):
    return [col.name == 'species' or "length" in col.name
            for col in columns]

fread('iris.csv', columns=colfilter, max_nrows=5)

Selecting a subset of columns

By name set
By slice
Drop by None
Keep one, drop rest
Boolean mask

fread(data, columns={"a", "b"})

fread(data, columns=slice(1, 3))   # columns at index 1 and 2

fread(data, columns={"B": None, "D": None})   # keep A and C

fread(data, columns={"a": "a", ...: None})    # keep only 'a'

fread(data, columns=[False, False, True, True])

Null values (`na_strings`)

fread(data, na_strings=['--', '', 'N/A'])

Skip blank lines

fread(data, skip_blank_lines=True)

fread(data, skip_to_line=2)           # skip the first line
fread(data, skip_to_string='first')   # start from the line containing "first"

These two parameters cannot be combined.

Limit rows

fread(data, max_nrows=1000)

Fill ragged rows

# Rows with fewer fields than others are padded with NA
fread(data, fill=True)

Thread count

fread('large.csv', nthreads=4)
# default: all available threads (dt.options.nthreads)

Reading Multiple Files with `iread()`

iread() returns an iterator of frames — useful for archives with multiple files, Excel workbooks with multiple sheets, or a list of paths:

from datatable import iread

for frame in iread("data.zip"):
    print(frame.shape)

iread() is the correct choice when a zip archive contains multiple CSV files or an Excel workbook has multiple sheets. fread() with a multi-source input will warn and only read the first source.

Writing Data

CSV — `DT.to_csv()`

Write to file
Return as string
Gzip compressed
Append to existing file
Custom separator

DT.to_csv("output.csv")

csv_text = DT.to_csv()   # path=None → returns str

DT.to_csv("output.csv.gz", compression="gzip")

DT.to_csv("log.csv", append=True)
# header is suppressed automatically when appending to an existing file

DT.to_csv("output.tsv", sep="\t")

Key parameters:

Parameter	Default	Description
`path`	`None`	File path. If `None`, returns a string.
`sep`	`","`	Field separator (single character).
`quoting`	`"minimal"`	When to quote fields: `"minimal"`, `"all"`, `"nonnumeric"`, `"none"`.
`append`	`False`	Append to existing file instead of overwriting.
`header`	`"auto"`	Write column names. Auto-suppressed when appending.
`compression`	`None`	`"gzip"` or `"auto"` (infers from file extension).
`bom`	`False`	Write UTF-8 byte-order mark (useful for Excel compatibility).

Jay binary format — `DT.to_jay()`

.jay is datatable’s native binary format. It preserves all type information and can be read back by fread() with zero parsing overhead.

Write to file
Return as bytes
Read back

DT.to_jay("data.jay")

payload = DT.to_jay()   # path=None → returns bytes

DT2 = fread("data.jay")

Use .jay files for caching intermediate results or sharing frames between datatable processes. Read/write speeds are significantly faster than CSV because no parsing is needed.

pandas — `DT.to_pandas()`

pdf = DT.to_pandas()
# Returns a pandas.DataFrame
# If DT has key columns, those become the DataFrame index

pandas must be installed. If the frame has key columns set, they will become the index columns in the resulting DataFrame.

NumPy — `DT.to_numpy()`

arr = DT.to_numpy()
# Returns a numpy.ndarray of shape (nrows, ncols)

Dict — `DT.to_dict()`

d = DT.to_dict()
# Returns {column_name: [values, ...], ...}

Thousand Separator Detection

fread automatically detects commas used as thousand separators (assumes ,):

fread("""Name|Salary|Position
         James|256,000|evangelist
         Ragnar|1,000,000|conqueror
         Loki|250360|trickster""")
# Salary column is read as int32: 256000, 1000000, 250360

Get Started

Core Concepts

Working with Data

Machine Learning

Migration & Comparisons

Reading and Writing Data

Reading Data with `fread()`

From a file

From a string (inline text)

From a URL

From an archive

From an Excel file

From the command line

Key `fread()` Parameters

Delimiter (`sep`)

Header (`header`)

Custom column names and types (`columns`)

Selecting a subset of columns

Null values (`na_strings`)

Skip blank lines

Limit rows

Fill ragged rows

Thread count

Reading Multiple Files with `iread()`

Writing Data

CSV — `DT.to_csv()`

Jay binary format — `DT.to_jay()`

pandas — `DT.to_pandas()`

NumPy — `DT.to_numpy()`

Dict — `DT.to_dict()`

Thousand Separator Detection

Build docs developers (and LLMs) love

Get Started

Core Concepts

Working with Data

Machine Learning

Migration & Comparisons

​Reading Data with fread()

​From a file

​From a string (inline text)

​From a URL

​From an archive

​From an Excel file

​From the command line

​Key fread() Parameters

​Delimiter (sep)

​Header (header)

​Custom column names and types (columns)

​Selecting a subset of columns

​Null values (na_strings)

​Skip blank lines

​Skip to a line or string

​Limit rows

​Fill ragged rows

​Thread count

​Reading Multiple Files with iread()

​Writing Data

​CSV — DT.to_csv()

​Jay binary format — DT.to_jay()

​pandas — DT.to_pandas()

​NumPy — DT.to_numpy()

​Dict — DT.to_dict()

​Thousand Separator Detection

Build docs developers (and LLMs) love

Reading Data with `fread()`

From a file

From a string (inline text)

From a URL

From an archive

From an Excel file

From the command line

Key `fread()` Parameters

Delimiter (`sep`)

Header (`header`)

Custom column names and types (`columns`)

Selecting a subset of columns

Null values (`na_strings`)

Skip blank lines

Skip to a line or string

Limit rows

Fill ragged rows

Thread count

Reading Multiple Files with `iread()`

Writing Data

CSV — `DT.to_csv()`

Jay binary format — `DT.to_jay()`

pandas — `DT.to_pandas()`

NumPy — `DT.to_numpy()`

Dict — `DT.to_dict()`

Thousand Separator Detection