Skip to main content
This guide will get you reading shared data in minutes using the Delta Sharing Python connector.

Prerequisites

  • Python 3.8+ (Python 3.6+ for delta-sharing versions < 1.1)
  • pip (latest version recommended)
  • Linux users: glibc version >= 2.31 for automatic installation
If you’re using Databricks Runtime, follow the Databricks Libraries documentation to install on your clusters.

Step 1: Install Delta Sharing

1

Install via pip

Install the Delta Sharing Python connector:
pip install delta-sharing
If installation fails due to delta-kernel-rust-sharing-wrapper:
  1. Check Python version: python --version (must be 3.8+)
  2. Upgrade pip: pip install --upgrade pip
  3. Check glibc version (Linux): ldd --version (must be 2.31+)
  4. Install Rust if needed: Follow the Rust installation guide
Alternatively, use an older version without Rust dependency:
pip install delta-sharing==1.0.5
2

Verify Installation

Verify the installation:
import delta_sharing
print(delta_sharing.__version__)

Step 2: Get a Profile File

A profile file is a JSON file containing credentials to access a Delta Sharing Server.
Download the open example profile to try Delta Sharing immediately:
curl -O https://databricks-datasets-oregon.s3-us-west-2.amazonaws.com/delta-sharing/share/open-datasets.share
This profile provides access to public COVID-19 and other sample datasets.
Profile files are JSON with the following structure:
{
  "shareCredentialsVersion": 1,
  "endpoint": "https://sharing.delta.io/delta-sharing/",
  "bearerToken": "your-bearer-token-here"
}
  • endpoint: The Delta Sharing Server URL
  • bearerToken: Authentication token for accessing shared data
  • shareCredentialsVersion: Protocol version (currently 1)

Step 3: Load Data with Python

Now you can read shared tables using pandas or Apache Spark.
import delta_sharing

# Point to the profile file
profile_file = "open-datasets.share"

# Create a SharingClient
client = delta_sharing.SharingClient(profile_file)

# List all shared tables
tables = client.list_all_tables()
print(tables)

# Create a table URL
# Format: <profile-path>#<share>.<schema>.<table>
table_url = profile_file + "#delta_sharing.default.owid-covid-data"

# Load first 10 rows as pandas DataFrame
df = delta_sharing.load_as_pandas(table_url, limit=10)
print(df)

Exploring Available Data

Use the SharingClient to discover what data is available:
import delta_sharing

client = delta_sharing.SharingClient("open-datasets.share")

# List all shares
shares = client.list_shares()
for share in shares:
    print(f"Share: {share.name}")

# List schemas in a share
schemas = client.list_schemas("delta_sharing")
for schema in schemas:
    print(f"Schema: {schema.name}")

# List tables in a schema
tables = client.list_tables("delta_sharing", "default")
for table in tables:
    print(f"Table: {table.name}")

# List all tables across all schemas
all_tables = client.list_all_tables()
for table in all_tables:
    print(f"{table.share}.{table.schema}.{table.name}")

Advanced Features

Change Data Feed (CDF)

If the shared table supports history (cdfEnabled=true), you can query table changes:
import delta_sharing

table_url = "profile.share#share.schema.table"

# Load changes between versions
changes = delta_sharing.load_table_changes_as_pandas(
    table_url,
    starting_version=0,
    ending_version=5
)

print(changes)

Streaming with Spark

Delta Sharing tables can be used as streaming sources:
val tablePath = "profile.share#share.schema.table"

val stream = spark.readStream
  .format("deltaSharing")
  .option("startingVersion", "1")
  .option("skipChangeCommits", "true")
  .load(tablePath)

stream.writeStream
  .format("console")
  .start()
  .awaitTermination()
Trigger.AvailableNow is not supported in Delta Sharing streaming as it requires Spark 3.3.0+, while Delta Sharing uses Spark 3.1.1.

Profile File Paths

Profile files can be stored in various locations:
profile_file = "/path/to/profile.share"

Complete Example

Here’s a complete example analyzing COVID-19 data:
import delta_sharing
import pandas as pd

# Download and use the example profile
profile_file = "open-datasets.share"

# Create client
client = delta_sharing.SharingClient(profile_file)

# List available tables
print("Available tables:")
for table in client.list_all_tables():
    print(f"  - {table.share}.{table.schema}.{table.name}")

# Load COVID-19 data
table_url = profile_file + "#delta_sharing.default.owid-covid-data"
df = delta_sharing.load_as_pandas(table_url)

print(f"\nLoaded {len(df)} rows")
print(f"Columns: {', '.join(df.columns)}")

# Analyze USA data
usa = df[df["iso_code"] == "USA"].copy()
usa["date"] = pd.to_datetime(usa["date"])
usa = usa.sort_values("date")

print("\nUSA COVID-19 Statistics (Latest):")
latest = usa.iloc[-1]
print(f"  Date: {latest['date']}")
print(f"  Total Cases: {latest['total_cases']:,.0f}")
print(f"  Total Deaths: {latest['total_deaths']:,.0f}")

Next Steps

Now that you’ve successfully loaded shared data, explore more:

Python API Reference

Explore the full Python connector API

Spark Connector

Learn about the Apache Spark connector

Set Up a Server

Share your own Delta Lake tables

Protocol Details

Deep dive into the Delta Sharing Protocol
Join the Delta Lake Slack community for help and to connect with other Delta Sharing users!

Build docs developers (and LLMs) love