Overview
Filtering in Delta Sharing operates as hints rather than guarantees:- Server behavior: Attempts to filter files using predicates, but may return additional files
- Client responsibility: Must always apply predicates to returned data
- Failure handling: If the server fails to parse or evaluate a predicate, it may skip it and return all files
Filtering is particularly effective for partitioned tables where predicates match partition columns, enabling efficient file pruning.
SQL Expressions for Filtering
SQL expressions provide a familiar way to express filtering predicates. They are sent as an array of strings in the request body.Request Format
Array of SQL boolean expressions. Expressions are AND-ed together (conjunctive).
Hint indicating how many rows the client plans to read. Server may use this with file statistics to return fewer files.
Deprecation Notice:
predicateHints is being deprecated in favor of jsonPredicateHints once all implementations migrate to JSON-based filtering.Supported Operators
The following comparison operators are supported:| Operator | Description | Example |
|---|---|---|
= | Equality | col = 123 |
<> | Inequality | col <> 'foo' |
> | Greater than | col > 'foo' |
< | Less than | 'foo' < col |
>= | Greater or equal | col >= 123 |
<= | Less or equal | 123 <= col |
IS NULL | Null check | col IS NULL |
IS NOT NULL | Not null check | col IS NOT NULL |
Expression Examples
JSON Predicates for Filtering
JSON predicates provide a structured, type-safe format for expressing filtering logic. This is the preferred method for partition column predicates.Request Format
Serialized JSON string representing a predicate tree. Parse to get the predicate object structure.
Predicate Structure
Each predicate is a JSON object with these fields:| Field | Type | Description |
|---|---|---|
op | string | Operation name (see Supported Operations) |
children | array | Child operations (empty for leaf operations) |
name | string | Column name (for column operations only) |
value | string | Literal value (for literal operations only) |
valueType | string | Data type (for column and literal operations) |
Supported Operations
Leaf Operations
Leaf Operations
columnRepresents a column reference.literalRepresents a constant value.
Unary Operations
Unary Operations
isNullChecks if a column is null.notLogical NOT operation.
Binary Operations
Binary Operations
equalEquality check (lessThanLess than check (lessThanOrEqualLess than or equal check (
=).<).<=).greaterThanGreater than check (>).greaterThanOrEqualGreater than or equal check (>=).N-ary Operations
N-ary Operations
andLogical AND operation (requires at least 2 children).orLogical OR operation (requires at least 2 children).
Supported Value Types
These value types are supported invalueType fields:
| Type | Description | Example Value |
|---|---|---|
bool | Boolean | "true" or "false" |
int | Integer | "42" |
long | Long integer | "1234567890" |
string | String | "hello" |
date | Date | "2021-04-29" (yyyy-mm-dd) |
float | Float | "3.14" |
double | Double | "3.14159" |
timestamp | Timestamp | "2022-01-01T00:00:00Z" (ISO 8601) |
All value types are case-insensitive when processed by the server.
Complete Examples
Example 1: Simple Equality
Find records where hire date equals a specific date.Example 2: Compound AND Predicate
Find records where hire date equals a date AND id is less than 25.Example 3: NOT NULL Check
Find records where id is not null.Example 4: Date Range with OR
Find records from Q1 2021 or Q4 2021.Combining with Limit Hints
When bothpredicateHints/jsonPredicateHints and limitHint are present:
- Server applies predicate filtering first
- Server then uses limit hint with file statistics to return fewer files
- Client must still apply both predicates and limit to the returned data
This combination is particularly effective for queries like
SELECT * FROM table WHERE status='active' LIMIT 1000, where the server can use both hints for optimal file selection.Best Practices
Use JSON Predicates
Prefer JSON predicates over SQL expressions for type safety and easier server implementation.
Filter on Partitions
Predicates on partition columns are most effective, enabling efficient file pruning.
Always Filter Client-Side
Never rely solely on server filtering. Always apply predicates to returned data.
Use Limit Hints
Combine filtering with limit hints to minimize data transfer for top-N queries.
Migration Path
Next Steps
REST APIs
Learn about Query Table API
Response Format
Understand file statistics