Overview
The cloud module provides a unified interface for synchronizing data between your local machine and various cloud storage services. It supports Azure Blob Storage, HuggingFace datasets, and SFTP servers.Main Functions
init
Whether to print verbose output
Cloud settings dictionary loaded from cloud.json
Initialized CloudService object, or None if cloud is disabled
load_cloud_settings
Path to the cloud settings JSON file
The cloud settings object
CloudService Base Class
TheCloudService class provides a unified interface for all cloud storage providers.
list_files
Path on the remote service you want to query
A listing of all files contained within the queried path
download_file
The file to download
The path on your local computer where you want to save the remote file
upload_file
The remote path on the cloud storage service where you want to upload your file
The local path to the file you want to upload
sync_files
Unique identifier for a locality/project
Path on your local computer
Path on the remote cloud storage service
Prints all operations that would run, but does not execute them
Whether to print verbose output
List of paths that should NOT be synchronized. Anything within a matching directory will be skipped
Azure Storage
AzureService
Azure-specific CloudService for working with Azure Blob Storage.Authentication credentials for Azure
The name of your Azure container
Access permission (“read_only” or “read_write”)
get_creds_from_env_azure
The credentials for Azure stored in environment variable
AZURE_STORAGE_CONNECTION_STRINGHuggingFace Storage
HuggingFaceService
HuggingFace-specific CloudService for working with HuggingFace datasets.Authentication credentials for HuggingFace
Repository identifier
Access permission (“read_only” or “read_write”)
Revision identifier (branch name)
get_creds_from_env_huggingface
The credentials for HuggingFace stored in environment variable
HF_TOKENSFTP Storage
SFTPService
SFTP-specific CloudService for working with SFTP servers.Your SFTP credentials
Access permission (“read_only” or “read_write”)
Base path on the remote SFTP server
How long of a delay to tolerate before ruling it a timeout (in seconds)
How many times to retry a failed connection before giving up
get_creds_from_env_sftp
The credentials for SFTP from environment variables:
SFTP_HOSTNAME- Server hostnameSFTP_PORT- Port number (default: 22)SFTP_USERNAME- UsernameSFTP_PASSWORD- Password (optional if using key)SFTP_KEY_FILENAME- Path to SSH key file (optional if using password)