Skip to main content

Overview

This quickstart shows you how to run a complete federated learning workflow directly from Google Colab—no local setup required. You’ll train a diabetes prediction model across two data owners using the PIMA Indians Diabetes dataset, all while keeping private data secure.
You can complete this tutorial with three Google accounts (one for each party), or invite two friends for a real collaborative experience.

What you’ll build

A federated learning system with three participants:
  • Data Owner 1 (DO1) - Holds partition 0 of the diabetes dataset
  • Data Owner 2 (DO2) - Holds partition 1 of the diabetes dataset
  • Data Scientist (DS) - Coordinates training and aggregates model updates
Raw data never leaves each data owner’s Colab environment—only model updates are shared.

Prerequisites

  • Three Google accounts (or two friends with Google accounts)
  • Access to Google Colab
  • 15-20 minutes
That’s it! No Python installation, no complex setup.

Step 1: Set up data owners

Data Owner 1 setup

Open a new Colab notebook and run:
# Install syft-flwr
!uv pip install -q "git+https://github.com/OpenMined/syft-flwr.git@main"

# Login as Data Owner 1
import syft_client as sc
import syft_flwr

do_email = input("Enter Data Owner 1's email: ")
do_client = sc.login_do(email=do_email)

Register the dataset

from pathlib import Path
from huggingface_hub import snapshot_download

# Download dataset from HuggingFace
DATASET_DIR = Path("./dataset/").expanduser().absolute()

if not DATASET_DIR.exists():
    snapshot_download(
        repo_id="khoaguin/pima-indians-diabetes-database-partitions",
        repo_type="dataset",
        local_dir=DATASET_DIR,
    )

# Create Syft dataset with mock and private paths
partition_number = 0  # DO1 uses partition 0
DATASET_PATH = DATASET_DIR / f"pima-indians-diabetes-database-{partition_number}"

do_client.create_dataset(
    name="pima-indians-diabetes-database",
    mock_path=DATASET_PATH / "mock",
    private_path=DATASET_PATH / "private",
    summary="PIMA Indians Diabetes dataset - Partition 0",
    readme_path=DATASET_PATH / "README.md",
    tags=["healthcare", "diabetes"],
    sync=True,
)

# Verify dataset creation
do_client.datasets.get_all()
The mock_path contains synthetic/sample data for code development. The private_path contains real data that never leaves this environment.

Data Owner 2 setup

Repeat the same steps in a new Colab notebook, but change the partition number:
partition_number = 1  # DO2 uses partition 1
Everything else stays identical. Now you have two data owners, each holding a different slice of the dataset.

Step 2: Set up data scientist

In a third Colab notebook, set up the Data Scientist role.

Login and add peers

!uv pip install -q "git+https://github.com/OpenMined/syft-flwr.git@main"

import syft_client as sc
import syft_flwr

ds_email = input("Enter Data Scientist's email: ")
ds_client = sc.login_ds(email=ds_email)

# Add both data owners as peers
do1_email = input("Enter Data Owner 1's email: ")
ds_client.add_peer(do1_email)

do2_email = input("Enter Data Owner 2's email: ")
ds_client.add_peer(do2_email)

# Verify peers were added
ds_client.peers

Explore available datasets

# Check DO1's datasets
do1_datasets = ds_client.datasets.get_all(datasite=do1_email)
do1_datasets[0].describe()

# Check DO2's datasets
do2_datasets = ds_client.datasets.get_all(datasite=do2_email)
do2_datasets[0].describe()

# Get mock dataset URLs for testing
mock_dataset_urls = [do1_datasets[0].mock_url, do2_datasets[0].mock_url]
mock_dataset_urls

Step 3: Prepare the FL project

Clone the Flower project

The FL project is built using Flower, defining model architecture, training logic, and communication. Syft-Flwr handles job submission and governance on top.
from pathlib import Path

!mkdir -p /content/fl-diabetes-prediction
!curl -sL https://github.com/khoaguin/fl-diabetes-prediction/archive/refs/heads/main.tar.gz | tar -xz --strip-components=1 -C /content/fl-diabetes-prediction

SYFT_FLWR_PROJECT_PATH = Path("/content/fl-diabetes-prediction")
print(f"Project at: {SYFT_FLWR_PROJECT_PATH}")

Bootstrap the project

Configure the project with participating datasites and generate the main.py entry point:
import syft_flwr

!rm -rf {SYFT_FLWR_PROJECT_PATH / "main.py"}

do_emails = [peer.email for peer in ds_client.peers]
syft_flwr.bootstrap(
    SYFT_FLWR_PROJECT_PATH,
    aggregator=ds_email,
    datasites=do_emails
)
print("Bootstrapped project successfully!")
bootstrap() auto-detects the transport layer. In Colab it uses P2P (Google Drive), locally it uses SyftBox.

Submit jobs to data owners

!rm -rf {SYFT_FLWR_PROJECT_PATH / "fl_diabetes_prediction" / "__pycache__"}

job_name = "fl-diabetes-training"

# Submit to DO1
ds_client.submit_python_job(
    user=do1_email,
    code_path=str(SYFT_FLWR_PROJECT_PATH),
    job_name=job_name,
)

# Submit to DO2
ds_client.submit_python_job(
    user=do2_email,
    code_path=str(SYFT_FLWR_PROJECT_PATH),
    job_name=job_name,
)

# Verify job submission
ds_client.jobs

Step 4: Data owners approve and run jobs

Back in each Data Owner’s notebook (DO1 and DO2):
# Check for incoming jobs
do_client.jobs

# Review and approve the job
do_client.jobs[0].approve()

# Process approved jobs (runs client-side training)
do_client.process_approved_jobs()

# Check job status
do_client.jobs
Data owners can inspect the submitted code before approving. This is a critical governance feature.

Step 5: Run federated training

Back in the Data Scientist notebook, install dependencies and run the aggregator:
# Install Flower and ML dependencies
!uv pip install \
    "flwr-datasets>=0.5.0" \
    "imblearn>=0.0" \
    "loguru>=0.7.3" \
    "pandas>=2.3.0" \
    "ipywidgets>=8.1.7" \
    "scikit-learn==1.7.1" \
    "torch>=2.8.0" \
    "ray==2.31.0"
# Start the aggregation server
ds_email = ds_client.email
syftbox_folder = f"/content/SyftBox_{ds_email}"

!SYFTBOX_EMAIL="{ds_email}" SYFTBOX_FOLDER="{syftbox_folder}" \
    uv run {str(SYFT_FLWR_PROJECT_PATH / "main.py")}

# Check final job status
ds_client.jobs
The aggregator coordinates training rounds, receiving model updates from each data owner and combining them using Federated Averaging (FedAvg).

Step 6: Clean up

When you’re done, clean up resources in each notebook:
ds_client.delete_syftbox()

What just happened?

You successfully trained a diabetes prediction model using federated learning:
  1. Two data owners each held a private partition of the dataset
  2. A data scientist coordinated training without seeing raw data
  3. Model updates were aggregated using the Flower framework
  4. Privacy was preserved—raw data never left the data owner’s environment
This is the core promise of federated learning: collaborative machine learning without sharing sensitive data.

Understanding the code

The Flower project follows a standard structure:

Client app

src/syft_flwr/fl_diabetes_prediction/client_app.py
from flwr.client import ClientApp, NumPyClient
from flwr.common import Context

class FlowerClient(NumPyClient):
    def __init__(self, net, trainloader, testloader):
        self.net = net
        self.trainloader = trainloader
        self.testloader = testloader

    def fit(self, parameters, config):
        set_weights(self.net, parameters)
        train(self.net, self.trainloader)
        return get_weights(self.net), len(self.trainloader), {}

    def evaluate(self, parameters, config):
        set_weights(self.net, parameters)
        loss, accuracy = evaluate(self.net, self.testloader)
        return loss, len(self.testloader), {"accuracy": accuracy}

app = ClientApp(client_fn=client_fn)

Server app

src/syft_flwr/fl_diabetes_prediction/server_app.py
from flwr.server import ServerApp, ServerConfig
from syft_flwr.strategy import FedAvgWithModelSaving

def server_fn(context: Context):
    strategy = FedAvgWithModelSaving(
        save_path=output_dir / "weights",
        fraction_fit=1.0,
        min_available_clients=1,
        initial_parameters=params,
        evaluate_metrics_aggregation_fn=weighted_average,
    )
    
    config = ServerConfig(num_rounds=5)
    return ServerAppComponents(config=config, strategy=strategy)

app = ServerApp(server_fn=server_fn)
Syft-Flwr automatically handles:
  • Loading datasets from private paths via load_syftbox_dataset()
  • Routing model updates through file sync
  • Managing job approval workflows

Next steps

Installation

Set up Syft-Flwr for local development

Development guide

Learn how to build custom FL projects

API reference

Explore the complete API

Examples

Explore more FL examples

Get help

Build docs developers (and LLMs) love