The SharingClient allows you to explore available shares, schemas, and tables:
import delta_sharing# Point to your profile fileprofile_file = "/path/to/profile.share"# Create a SharingClientclient = delta_sharing.SharingClient(profile_file)# List all shared tablestables = client.list_all_tables()for table in tables: print(f"{table.share}.{table.schema}.{table.name}")
For large tables, use the limit parameter to fetch only a sample:
# Fetch only 10 rows to explore the table structuresample_df = delta_sharing.load_as_pandas(table_url, limit=10)print(sample_df)
The limit parameter is useful for exploration but does not guarantee which rows are returned. For production use cases, load the full table or use appropriate filtering.
For better performance with supported tables, use Delta format:
# Explicitly use Delta format for readingdf = delta_sharing.load_as_pandas( table_url, use_delta_format=True)
Delta format provides more efficient data transfer and better predicate pushdown. The connector automatically chooses the best format if use_delta_format is not specified.
To use load_as_spark, you must be running in a PySpark environment with the Apache Spark Connector for Delta Sharing installed. See Apache Spark Connector documentation for setup instructions.
import delta_sharing# Load table changes from version 0 to version 5changes_df = delta_sharing.load_table_changes_as_pandas( table_url, starting_version=0, ending_version=5)print(changes_df.head())
The resulting DataFrame includes these columns:
All original table columns
_change_type: Type of change (insert, update_preimage, update_postimage, delete)
# Get changes between two timestampschanges_df = delta_sharing.load_table_changes_as_pandas( table_url, starting_timestamp="2024-01-01T00:00:00Z", ending_timestamp="2024-01-31T23:59:59Z")
# Use batch conversion for large change setschanges_df = delta_sharing.load_table_changes_as_pandas( table_url, starting_version=0, ending_version=100, convert_in_batches=True, use_delta_format=True)
import delta_sharing# Get current table versionversion = delta_sharing.get_table_version(table_url)print(f"Current version: {version}")# Get version at specific timestampversion_at_time = delta_sharing.get_table_version( table_url, starting_timestamp="2024-01-15T10:00:00Z")# Get table metadatametadata = delta_sharing.get_table_metadata(table_url)print(f"Table ID: {metadata.id}")print(f"Schema: {metadata.schema_string}")# Get table protocolprotocol = delta_sharing.get_table_protocol(table_url)print(f"Min reader version: {protocol.min_reader_version}")