Key capabilities
Column-oriented storage
Data is stored column-by-column, making column-wise operations faster and more memory-efficient than row-oriented layouts.
Native C++ implementation
All data types — including strings — are handled natively in C++, matching or exceeding the performance of pandas and numpy for numeric types and surpassing them for strings.
Expressive query syntax
The
DT[i, j, ...] square-bracket notation lets you select, filter, transform, group, sort, and join data in a single concise expression, inspired by R’s data.table.Multi-threaded processing
Time-consuming operations automatically use all available CPU cores for maximum throughput.
Fast I/O
fread() reads CSV, text, Excel, and other formats with automatic detection of separators, headers, and column types. It supports files, URLs, shell output, archives, and glob patterns.Memory-mapped datasets
Data stored on disk in the binary
.jay format can be memory-mapped and worked on without loading everything into RAM, enabling out-of-memory workflows transparently.Design goals
datatable is designed to meet the following requirements:- All types support null values with minimal overhead.
- Date-time and categorical types are supported natively. Object type is available but its use is discouraged.
- Minimal data copying — copy-on-write semantics are used for shared data, and rowindex views are used in filtering, sorting, grouping, and joining to avoid unnecessary copies.
- Interoperability with pandas, numpy, pyarrow, and plain Python: convert to and from other frameworks with a single method call.
Get started
Installation
Install datatable with pip on macOS, Linux, or Windows, or build from source.
Quick start
Load data, run your first queries, and export results in a few minutes.