Scale your services horizontally by running multiple container replicas across your cluster. Uncloud automatically distributes containers across machines and load balances traffic between them.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/psviderski/uncloud/llms.txt
Use this file to discover all available pages before exploring further.
Quick start
Scale a service to 5 replicas:Scaling methods
You can scale services in two ways:Using uc service scale
Scale an existing service:- Inspects the current service state
- Plans the changes (adding or removing containers)
- Asks for confirmation when scaling down
- Executes the deployment with rolling updates
Using Compose files
Set replicas in yourcompose.yaml:
deploy.replicas syntax:
How scaling works
Horizontal scaling
Uncloud distributes containers across available machines:- 1 machine, 5 replicas: All 5 containers run on the single machine
- 3 machines, 5 replicas: Containers spread evenly (2, 2, 1 distribution)
- 5 machines, 5 replicas: One container per machine
Round-robin distribution
Uncloud uses a round-robin approach to spread containers:- Container 1 → machine-1
- Container 2 → machine-2
- Container 3 → machine-3
- Container 4 → machine-1
- Container 5 → machine-2
- Container 6 → machine-3
Scaling up
When you increase replicas:Scaling down
When you decrease replicas:Confirm scale down
Uncloud shows the plan and asks for confirmation (to prevent accidental data loss).
Load balancing
Uncloud automatically load balances traffic across replicas.HTTP/HTTPS traffic
Caddy reverse proxy distributes requests to all healthy containers:- Round-robin load balancing
- Automatic health checking (passive and active)
- Failed requests retry on other upstreams
- Unhealthy containers automatically removed
Internal service discovery
Services communicate via DNS names that resolve to all healthy container IPs:api hostname returns all 3 container IPs. Your HTTP client (like curl, axios, fetch) handles load distribution.
Placement constraints
Control which machines can run your service containers.Using x-machines extension
Restrict service to specific machines:Pin to single machine
For stateful services that can’t run across machines:Geographic distribution
Scaling strategies
Auto-scaling (manual)
Uncloud doesn’t have built-in auto-scaling yet. Scale manually based on metrics:Scheduled scaling
Use cron jobs for time-based scaling:Blue-green deployments
Deploy new version alongside old version:Scaling with volumes
Shared volumes (read-only)
Multiple replicas can safely read from the same volume:Shared volumes (read-write)
Be careful with concurrent writes:Per-replica volumes
For data that shouldn’t be shared, use external volumes per machine:Global services
Run exactly one container on every machine:- Add a machine → new container starts
- Remove a machine → container is removed
Global service use cases
- Monitoring agents: Collect metrics from each machine
- Log collectors: Forward logs from each machine
- Local caches: Provide caching on every machine
- Network tools: DNS resolvers, proxies
Scaling best practices
Start small, scale up
Start small, scale up
Begin with fewer replicas and scale up based on actual load:Monitor performance and increase as needed.
Distribute across machines
Distribute across machines
Use multiple machines for high availability:If one machine fails, others continue serving traffic.
Configure health checks
Configure health checks
Health checks ensure only working containers receive traffic:
Plan for failure
Plan for failure
Run enough replicas to handle machine failures:
- 3 machines → at least 4-6 replicas
- One machine fails → remaining replicas handle the load
Monitor resource usage
Monitor resource usage
Check CPU and memory before scaling:If containers are underutilized, you might not need more replicas.
Scaling limitations
Cannot scale to zero
Uncloud doesn’t support scaling to 0 replicas:Global services cannot be scaled
Global services always run one replica per machine:Volumes and scaling
Shared volumes with multiple replicas:- Read-only: Safe to scale freely
- Read-write: Ensure application handles concurrent access
- Database volumes: Don’t scale beyond 1 replica unless using clustering
Real-world examples
Scale web application
Geographic distribution
Background workers
Monitoring scaled services
List all containers
Inspect service
- Service ID and name
- Number of replicas
- Container locations
- Health status
- Endpoints
Check logs
View Caddy upstreams
Next steps
Rolling Updates
Update scaled services with zero downtime
Health Checks
Ensure only healthy containers receive traffic
Docker Compose
Define multi-service applications
Deploying Services
Learn about service deployment basics
