This guide will get you reading shared data in minutes using the Delta Sharing Python connector.
Prerequisites
Python 3.8+ (Python 3.6+ for delta-sharing versions < 1.1)
pip (latest version recommended)
Linux users : glibc version >= 2.31 for automatic installation
Step 1: Install Delta Sharing
Install via pip
Install the Delta Sharing Python connector: pip install delta-sharing
Troubleshooting Installation
If installation fails due to delta-kernel-rust-sharing-wrapper:
Check Python version : python --version (must be 3.8+)
Upgrade pip : pip install --upgrade pip
Check glibc version (Linux): ldd --version (must be 2.31+)
Install Rust if needed: Follow the Rust installation guide
Alternatively, use an older version without Rust dependency: pip install delta-sharing== 1.0.5
Verify Installation
Verify the installation: import delta_sharing
print (delta_sharing. __version__ )
Step 2: Get a Profile File
A profile file is a JSON file containing credentials to access a Delta Sharing Server.
Use Example Server
From Your Provider
Run Your Own Server
Download the open example profile to try Delta Sharing immediately: curl -O https://databricks-datasets-oregon.s3-us-west-2.amazonaws.com/delta-sharing/share/open-datasets.share
This profile provides access to public COVID-19 and other sample datasets. If your data provider has shared data with you, they will provide a .share profile file. Save it to your local filesystem or cloud storage.
Set up the Delta Sharing Reference Server to share your own data. See the Delta Sharing Server documentation for setup instructions.
Step 3: Load Data with Python
Now you can read shared tables using pandas or Apache Spark.
Pandas - Basic
Pandas - Full Table
Pandas - Advanced
PySpark
Spark DataFrameReader
import delta_sharing
# Point to the profile file
profile_file = "open-datasets.share"
# Create a SharingClient
client = delta_sharing.SharingClient(profile_file)
# List all shared tables
tables = client.list_all_tables()
print (tables)
# Create a table URL
# Format: <profile-path>#<share>.<schema>.<table>
table_url = profile_file + "#delta_sharing.default.owid-covid-data"
# Load first 10 rows as pandas DataFrame
df = delta_sharing.load_as_pandas(table_url, limit = 10 )
print (df)
Exploring Available Data
Use the SharingClient to discover what data is available:
import delta_sharing
client = delta_sharing.SharingClient( "open-datasets.share" )
# List all shares
shares = client.list_shares()
for share in shares:
print ( f "Share: { share.name } " )
# List schemas in a share
schemas = client.list_schemas( "delta_sharing" )
for schema in schemas:
print ( f "Schema: { schema.name } " )
# List tables in a schema
tables = client.list_tables( "delta_sharing" , "default" )
for table in tables:
print ( f "Table: { table.name } " )
# List all tables across all schemas
all_tables = client.list_all_tables()
for table in all_tables:
print ( f " { table.share } . { table.schema } . { table.name } " )
Advanced Features
Change Data Feed (CDF)
If the shared table supports history (cdfEnabled=true), you can query table changes:
import delta_sharing
table_url = "profile.share#share.schema.table"
# Load changes between versions
changes = delta_sharing.load_table_changes_as_pandas(
table_url,
starting_version = 0 ,
ending_version = 5
)
print (changes)
Streaming with Spark
Delta Sharing tables can be used as streaming sources:
val tablePath = "profile.share#share.schema.table"
val stream = spark.readStream
.format( "deltaSharing" )
.option( "startingVersion" , "1" )
.option( "skipChangeCommits" , "true" )
.load(tablePath)
stream.writeStream
.format( "console" )
.start()
.awaitTermination()
Trigger.AvailableNow is not supported in Delta Sharing streaming as it requires Spark 3.3.0+, while Delta Sharing uses Spark 3.1.1.
Profile File Paths
Profile files can be stored in various locations:
Local File System
Cloud Storage (Python)
Cloud Storage (Spark)
Databricks
profile_file = "/path/to/profile.share"
Python connector supports any URL via fsspec : # S3
profile_file = "s3a://my-bucket/config/profile.share"
# Azure Blob Storage
profile_file = "abfs://[email protected] /profile.share"
# Google Cloud Storage
profile_file = "gs://my-bucket/config/profile.share"
Spark connector supports Hadoop FileSystem URLs: # S3
profile_file = "s3a://my-bucket/config/profile.share"
# HDFS
profile_file = "hdfs://namenode:8020/user/config/profile.share"
On Databricks, use DBFS paths: profile_file = "/dbfs/mnt/config/profile.share"
Complete Example
Here’s a complete example analyzing COVID-19 data:
import delta_sharing
import pandas as pd
# Download and use the example profile
profile_file = "open-datasets.share"
# Create client
client = delta_sharing.SharingClient(profile_file)
# List available tables
print ( "Available tables:" )
for table in client.list_all_tables():
print ( f " - { table.share } . { table.schema } . { table.name } " )
# Load COVID-19 data
table_url = profile_file + "#delta_sharing.default.owid-covid-data"
df = delta_sharing.load_as_pandas(table_url)
print ( f " \n Loaded { len (df) } rows" )
print ( f "Columns: { ', ' .join(df.columns) } " )
# Analyze USA data
usa = df[df[ "iso_code" ] == "USA" ].copy()
usa[ "date" ] = pd.to_datetime(usa[ "date" ])
usa = usa.sort_values( "date" )
print ( " \n USA COVID-19 Statistics (Latest):" )
latest = usa.iloc[ - 1 ]
print ( f " Date: { latest[ 'date' ] } " )
print ( f " Total Cases: { latest[ 'total_cases' ] :,.0f} " )
print ( f " Total Deaths: { latest[ 'total_deaths' ] :,.0f} " )
Next Steps
Now that you’ve successfully loaded shared data, explore more:
Python API Reference Explore the full Python connector API
Spark Connector Learn about the Apache Spark connector
Set Up a Server Share your own Delta Lake tables
Protocol Details Deep dive into the Delta Sharing Protocol