Layered constraint model
Scheduling constraints come from three layers, and pod scheduling constraints must fall within NodePool constraints:- Cloud provider — defines all instance types, architectures, zones, and purchase types available
- NodePool — the cluster administrator adds constraints via
spec.template.spec.requirements - Pod — workload authors add specifications via
nodeSelector,affinity, andtopologySpreadConstraints
Resource requests
Pods declare resource requests and limits in their spec. Karpenter uses onlyrequests when selecting instance types — limits are for runtime resource control.
Accelerator and GPU resources
Karpenter supports accelerator resources:nvidia.com/gpuamd.com/gpuaws.amazon.com/neuronaws.amazon.com/neuroncorehabana.ai/gaudi
You must deploy the appropriate device plugin DaemonSet for accelerator resources. Without it, Karpenter will not see those nodes as initialized.
Node selectors and labels
WithnodeSelector you can request a node matching specific key-value pairs. This works with both well-known labels and custom labels you define on NodePools.
Exists operator:
nodeSelector or nodeAffinity.
Well-known labels
The following labels can be used in NodePool requirements or pod scheduling constraints.| Label | Example | Description |
|---|---|---|
topology.kubernetes.io/zone | us-east-2a | Availability zone |
node.kubernetes.io/instance-type | g4dn.8xlarge | EC2 instance type |
kubernetes.io/os | linux | Operating system (linux or windows) |
kubernetes.io/arch | amd64 | CPU architecture (amd64 or arm64) |
karpenter.sh/capacity-type | spot | Capacity type: spot, on-demand, or reserved |
karpenter.sh/nodepool | default | NodePool that provisioned the node |
karpenter.k8s.aws/ec2nodeclass | default | EC2NodeClass used to provision the node |
karpenter.k8s.aws/instance-category | g | Instance category (string before generation number) |
karpenter.k8s.aws/instance-family | g4dn | Instance family |
karpenter.k8s.aws/instance-size | 8xlarge | Instance size |
karpenter.k8s.aws/instance-cpu | 32 | Number of vCPUs |
karpenter.k8s.aws/instance-memory | 131072 | Memory in MiB |
karpenter.k8s.aws/instance-gpu-name | t4 | GPU name |
karpenter.k8s.aws/instance-gpu-count | 1 | Number of GPUs |
karpenter.k8s.aws/instance-gpu-memory | 16384 | GPU memory in MiB |
karpenter.k8s.aws/instance-local-nvme | 900 | Local NVMe storage in GiB |
karpenter.k8s.aws/instance-network-bandwidth | 131072 | Baseline network bandwidth in Mbps |
karpenter.k8s.aws/instance-hypervisor | nitro | Hypervisor type |
karpenter.k8s.aws/instance-generation | 4 | Generation number |
karpenter.k8s.aws/instance-tenancy | default | Tenancy: default or dedicated |
topology.k8s.aws/zone-id | use1-az1 | Globally consistent zone ID |
Karpenter extends standard Kubernetes operators with
Gte (>=) and Lte (<=) for numeric label comparisons, such as karpenter.k8s.aws/instance-cpu or karpenter.k8s.aws/instance-memory.Node affinity
UsenodeAffinity for more complex constraints than nodeSelector supports.
requiredDuringSchedulingIgnoredDuringExecution— hard requirement; pod won’t schedule if unmetpreferredDuringSchedulingIgnoredDuringExecution— soft preference; pod may still schedule if unmet
nodeSelectorTerms as OR conditions:
nodeSelectorTerms in order and uses the first that works. If all fail, it backs off and retries.
Preferred affinities can cause more nodes to be created than expected, because Karpenter prefers to create new nodes to satisfy preferences. Use required affinities when strict placement is needed.
Taints and tolerations
Taints prevent pods from scheduling on a node unless the pod tolerates the taint. NodePool with a GPU taint:Topology spread constraints
UsetopologySpreadConstraints to distribute pods across failure domains and limit blast radius.
topology.kubernetes.io/zonekubernetes.io/hostnamekarpenter.sh/capacity-type
NodePools do not automatically balance or rebalance nodes across availability zones. Achieve AZ balance by defining zonal topology spread constraints on pods.
Pod affinity and anti-affinity
UsepodAffinity and podAntiAffinity to control scheduling relative to other pods.
system=backend pod is running, and prevents more than one pod with app=inflate per node.
Weighted NodePools
Assign a.spec.weight to NodePools to control priority. Karpenter attempts to schedule using the highest-weight NodePool first.
Reserved capacity prioritization
To prioritize savings plan or reserved instances:Fallback NodePool
Assign a higher weight to a NodePool with specific constraints to make it the cluster default:Karpenter does not guarantee it will always choose the highest-priority NodePool. If a pod can’t be scheduled with the highest-priority NodePool, a lower-priority NodePool may be used for that batch.
Advanced scheduling techniques
Scheduling by node resources
Scheduling by node resources
Use well-known labels to require specific hardware capabilities.Require any local NVMe storage:Require at least 100 GiB of NVMe storage:Require at least 50 Gbps network bandwidth:
Workload segregation with Exists operator
Workload segregation with Exists operator
Isolate pods on different nodes without creating a unique NodePool per team:Each team’s deployment selects its own label value:Karpenter applies the label dynamically to launched nodes based on pod requirements.
On-demand/spot ratio split
On-demand/spot ratio split
Create virtual topology domains to achieve a desired spot-to-on-demand ratio.With four spot values and one on-demand value, spreading evenly gives a 4:1 ratio:Workload topology spread:
Persistent volume topology
Persistent volume topology
Karpenter automatically detects storage scheduling requirements and includes them in node launch decisions.
The EBS CSI driver uses
topology.ebs.csi.aws.com/zone instead of topology.kubernetes.io/zone. Karpenter translates between these labels internally. When configuring a StorageClass for the EBS CSI driver, use topology.ebs.csi.aws.com/zone.