Skip to main content

Overview

The cloud module provides a unified interface for synchronizing data between your local machine and various cloud storage services. It supports Azure Blob Storage, HuggingFace datasets, and SFTP servers.

Main Functions

init

init(verbose: bool, cloud_settings: dict) -> CloudService | None
Creates a CloudService object based on user settings.
verbose
bool
Whether to print verbose output
cloud_settings
dict
Cloud settings dictionary loaded from cloud.json
CloudService
CloudService | None
Initialized CloudService object, or None if cloud is disabled

load_cloud_settings

load_cloud_settings(filepath: str = "cloud.json") -> dict
Load cloud settings file from disk.
filepath
str
default:"cloud.json"
Path to the cloud settings JSON file
settings
dict
The cloud settings object

CloudService Base Class

The CloudService class provides a unified interface for all cloud storage providers.

list_files

list_files(remote_path: str) -> list[CloudFile]
List all files at the given path on the cloud storage service.
remote_path
str
Path on the remote service you want to query
files
list[CloudFile]
A listing of all files contained within the queried path

download_file

download_file(remote_file: CloudFile, local_file_path: str)
Download a remote file from the cloud storage service.
remote_file
CloudFile
The file to download
local_file_path
str
The path on your local computer where you want to save the remote file

upload_file

upload_file(remote_file_path: str, local_file_path: str)
Upload a local file to the cloud storage service.
remote_file_path
str
The remote path on the cloud storage service where you want to upload your file
local_file_path
str
The local path to the file you want to upload

sync_files

sync_files(
    locality: str,
    local_folder: str,
    remote_folder: str,
    dry_run: bool = False,
    verbose: bool = False,
    ignore_paths: list[str] = None
)
Synchronize files between your local computer and the cloud storage service.
locality
str
Unique identifier for a locality/project
local_folder
str
Path on your local computer
remote_folder
str
Path on the remote cloud storage service
dry_run
bool
default:"False"
Prints all operations that would run, but does not execute them
verbose
bool
default:"False"
Whether to print verbose output
ignore_paths
list[str]
List of paths that should NOT be synchronized. Anything within a matching directory will be skipped

Azure Storage

AzureService

Azure-specific CloudService for working with Azure Blob Storage.
AzureService(
    credentials: AzureCredentials,
    container_name: str,
    access: CloudAccess
)
credentials
AzureCredentials
Authentication credentials for Azure
container_name
str
The name of your Azure container
access
CloudAccess
Access permission (“read_only” or “read_write”)

get_creds_from_env_azure

get_creds_from_env_azure() -> AzureCredentials
Reads and returns Azure credentials from environment settings.
credentials
AzureCredentials
The credentials for Azure stored in environment variable AZURE_STORAGE_CONNECTION_STRING

HuggingFace Storage

HuggingFaceService

HuggingFace-specific CloudService for working with HuggingFace datasets.
HuggingFaceService(
    credentials: HuggingFaceCredentials,
    repo_id: str,
    access: CloudAccess,
    revision: str = "main"
)
credentials
HuggingFaceCredentials
Authentication credentials for HuggingFace
repo_id
str
Repository identifier
access
CloudAccess
Access permission (“read_only” or “read_write”)
revision
str
default:"main"
Revision identifier (branch name)

get_creds_from_env_huggingface

get_creds_from_env_huggingface() -> HuggingFaceCredentials
Reads and returns HuggingFace credentials from environment settings.
credentials
HuggingFaceCredentials
The credentials for HuggingFace stored in environment variable HF_TOKEN

SFTP Storage

SFTPService

SFTP-specific CloudService for working with SFTP servers.
SFTPService(
    credentials: SFTPCredentials,
    access: CloudAccess,
    base_path: str = ".",
    timeout: int = 30,
    retries: int = 3
)
credentials
SFTPCredentials
Your SFTP credentials
access
CloudAccess
Access permission (“read_only” or “read_write”)
base_path
str
default:"."
Base path on the remote SFTP server
timeout
int
default:"30"
How long of a delay to tolerate before ruling it a timeout (in seconds)
retries
int
default:"3"
How many times to retry a failed connection before giving up

get_creds_from_env_sftp

get_creds_from_env_sftp() -> SFTPCredentials
Reads and returns SFTP credentials from environment settings.
credentials
SFTPCredentials
The credentials for SFTP from environment variables:
  • SFTP_HOSTNAME - Server hostname
  • SFTP_PORT - Port number (default: 22)
  • SFTP_USERNAME - Username
  • SFTP_PASSWORD - Password (optional if using key)
  • SFTP_KEY_FILENAME - Path to SSH key file (optional if using password)

Example Usage

from openavmkit.cloud.cloud import init, load_cloud_settings

# Load cloud settings from cloud.json
settings = load_cloud_settings("cloud.json")

# Initialize cloud service
cloud = init(verbose=True, cloud_settings=settings)

# Sync files
if cloud:
    cloud.sync_files(
        locality="cook-il",
        local_folder="out/results",
        remote_folder="results/cook-il",
        verbose=True
    )

Build docs developers (and LLMs) love