Overview
The diabetes prediction example trains a deep neural network to predict diabetes onset based on medical measurements. The data is distributed across multiple data owners (hospitals, clinics) who want to collaborate on model training while keeping their patient data private.Key Features
- Federated Learning: Decentralized training across multiple clients using Flower framework
- Privacy-Preserving: Data remains with data owners; only model updates are shared
- Imbalanced Data Handling: Uses SMOTE (Synthetic Minority Over-sampling Technique) for class balancing
- Advanced Neural Architecture: Deep neural network with batch normalization and dropout
- Multiple Deployment Modes: Local simulation, Google Colab, and SyftBox distributed deployment
Architecture
Model Structure
The neural network consists of:- Input Layer: 6 features (after preprocessing)
- Hidden Layers:
- Layer 1: 32 units with BatchNorm, LeakyReLU, and Dropout (0.2)
- Layer 2: 24 units with BatchNorm, LeakyReLU, and Dropout (0.25)
- Layer 3: 16 units with BatchNorm and LeakyReLU
- Output Layer: Single unit with Sigmoid activation (binary classification)
Dataset
Source: PIMA Indians Diabetes Database Features:- Pregnancies
- Glucose
- Blood Pressure
- BMI (Body Mass Index)
- Diabetes Pedigree Function
- Age
- Removed
SkinThicknessandInsulinfeatures - Imputed zero values with mean/median
- Applied SMOTE for class balancing
- Standardized features using StandardScaler
Setup
Clone the Project
Install Dependencies
Assuming you have Python and uv installed:flwr-datasets>=0.5.0- Federated dataset utilitiestorch>=2.8.0- Deep learning frameworkscikit-learn==1.6.1- Machine learning utilitiesimblearn- Imbalanced data handling (SMOTE)syft_flwr- SyftBox integration
Running the Example
Local Simulation
Run federated learning locally with simulated clients:- Simulate 2 supernodes (clients) locally
- Run 2 federated learning rounds
- Save model weights to
./weights/directory
pyproject.toml):
Jupyter Notebooks
For interactive exploration, use the included notebooks:Local Setup
Thelocal/ directory contains notebooks for running on a local SyftBox network:
- Start with
local/do1.ipynb(Data Owner 1) - Then run
local/do2.ipynb(Data Owner 2) - Finally open
local/ds.ipynb(Data Scientist)
Distributed Setup
Thedistributed/ directory contains the same workflow but for real distributed deployment where each party runs on different machines using the SyftBox client.
Client Implementation
The Flower client handles local training and evaluation:Server Strategy
The server usesFedAvgWithModelSaving strategy:
Fault Tolerance
The system handles client failures during federated learning:Default Configuration (50% failure tolerance)
- Total Clients: 2
- Minimum Required: 1
- Failure Tolerance: Can continue with 1 out of 2 clients
Configuration Parameters
Training Details
- Optimizer: Adam (lr=0.001, weight_decay=0.0005)
- Loss Function: Binary Cross-Entropy (BCELoss)
- Batch Size: 10 (training), full dataset (testing)
- Local Epochs: 1 per round (configurable)
- Device Support: CUDA, MPS (Apple Silicon), XPU, or CPU
Project Structure
Deployment Options
Local Simulation
Run everything on your local machine for development and testing.
Google Colab
Zero-setup federated learning using only Google Colab notebooks.
SyftBox Network
Deploy across real distributed nodes using the SyftBox client.
Next Steps
Try Federated Analytics
Learn how to compute statistics on distributed data.
Explore FedRAG
Build privacy-preserving question answering systems.