What is the Delta Sharing Reference Server?
The Delta Sharing Reference Server is a reference implementation of the Delta Sharing Protocol that enables you to share Delta Lake and Apache Parquet tables stored on cloud object storage (S3, Azure Blob Storage, ADLS Gen2, GCS, Cloudflare R2) with consumers. This lightweight server provides a REST API that clients can use to discover and access shared data without copying it. The server generates pre-signed URLs that allow clients to read data files directly from your cloud storage, ensuring efficient data transfer and minimal overhead.When to Use the Reference Server
The reference server is ideal for:Development & Testing
Testing Delta Sharing connectors and integrations during development
Small-Scale Sharing
Sharing datasets with a limited number of trusted recipients
Proof of Concepts
Quickly demonstrating Delta Sharing capabilities
Self-Hosted Solutions
Organizations that need full control over their sharing infrastructure
When to Use Managed Alternatives
For production workloads with enterprise requirements, consider managed Delta Sharing services:Databricks Delta Sharing
Databricks provides a fully managed Delta Sharing service with:- Enterprise security: Built-in authentication, authorization, and audit logging
- Scalability: Handles high-volume data sharing across thousands of recipients
- Unity Catalog integration: Centralized governance and access control
- No infrastructure management: Fully managed service with high availability
- Fine-grained permissions: Row-level and column-level security
- Monitoring and analytics: Track usage and access patterns
Other Vendors
Several vendors offer Delta Sharing capabilities. Check the community connectors page for a list of available providers.Architecture Overview
The reference server operates as a stateless REST API server:Client Request
A Delta Sharing client (Python, Spark, etc.) sends a request to the server for table metadata or data
Server Processing
The server authenticates the request, reads Delta Lake metadata from cloud storage, and applies any filters
Pre-signed URLs
The server generates pre-signed URLs for the data files and returns them to the client
Key Features
Multi-Cloud Support
Multi-Cloud Support
Share tables stored on AWS S3, Azure Blob Storage, Azure Data Lake Storage Gen2, Google Cloud Storage, and Cloudflare R2
Delta Lake & Parquet
Delta Lake & Parquet
Share both Delta Lake tables (with full history and metadata) and Apache Parquet tables
Change Data Feed (CDF)
Change Data Feed (CDF)
Enable recipients to query incremental changes to shared tables when CDF is enabled
Predicate Pushdown
Predicate Pushdown
Support for filtering data at the source to minimize data transfer
Bearer Token Authentication
Bearer Token Authentication
Simple token-based authentication for securing API access
Docker Support
Docker Support
Pre-built Docker images available for easy deployment
Limitations
Understanding the reference server’s limitations is crucial for determining if it’s the right choice for your use case.
- Basic Authentication: Only supports bearer token authentication out of the box
- No User Management: No built-in user management or fine-grained permissions
- No Audit Logging: Limited logging capabilities for compliance requirements
- Manual Configuration: Tables must be manually configured in YAML files
- No High Availability: Single-server deployment with no clustering support
- Security: Requires additional infrastructure (proxy, JWT auth) for production security
Next Steps
Install the Server
Download and set up the reference server
Configure the Server
Learn how to configure shares, schemas, and tables
Cloud Storage Setup
Configure authentication for your cloud storage provider
Secure Your Server
Set up authentication and authorization