Scheduling

If your pods have no placement requirements, Karpenter can choose nodes from the full range of available cloud provider resources. By taking advantage of Karpenter’s layered constraint model, you can ensure the precise type and amount of resources needed are available to your pods. Karpenter supports standard Kubernetes scheduling constraints, so you can define a single set of rules that apply to both existing and Karpenter-provisioned capacity.

Layered constraint model

Scheduling constraints come from three layers, and pod scheduling constraints must fall within NodePool constraints:

Cloud provider — defines all instance types, architectures, zones, and purchase types available
NodePool — the cluster administrator adds constraints via spec.template.spec.requirements
Pod — workload authors add specifications via nodeSelector, affinity, and topologySpreadConstraints

If a pod requests a zone that the NodePool doesn’t allow, it won’t be scheduled.

Resource requests

Pods declare resource requests and limits in their spec. Karpenter uses only requests when selecting instance types — limits are for runtime resource control.

apiVersion: v1
kind: Pod
metadata:
  name: myapp
spec:
  containers:
  - name: app
    image: myimage
    resources:
      requests:
        memory: "128Mi"
        cpu: "500m"
      limits:
        memory: "256Mi"
        cpu: "1000m"

Accelerator and GPU resources

Karpenter supports accelerator resources:

nvidia.com/gpu
amd.com/gpu
aws.amazon.com/neuron
aws.amazon.com/neuroncore
habana.ai/gaudi

spec:
  template:
    spec:
      containers:
      - resources:
          limits:
            nvidia.com/gpu: "1"

You must deploy the appropriate device plugin DaemonSet for accelerator resources. Without it, Karpenter will not see those nodes as initialized.

Node selectors and labels

With nodeSelector you can request a node matching specific key-value pairs. This works with both well-known labels and custom labels you define on NodePools.

nodeSelector:
  topology.kubernetes.io/zone: us-west-2a
  karpenter.sh/capacity-type: spot

For custom labels, declare them in the NodePool requirements with the Exists operator:

requirements:
  - key: user.defined.label/type
    operator: Exists

Then pods can select nodes using that label in nodeSelector or nodeAffinity.

Label domains matter. karpenter.k8s.aws/instance-family enforces node properties, while node.kubernetes.io/instance-family is unknown to Karpenter and treated as a custom label.

Well-known labels

The following labels can be used in NodePool requirements or pod scheduling constraints.

Label	Example	Description
`topology.kubernetes.io/zone`	`us-east-2a`	Availability zone
`node.kubernetes.io/instance-type`	`g4dn.8xlarge`	EC2 instance type
`kubernetes.io/os`	`linux`	Operating system (`linux` or `windows`)
`kubernetes.io/arch`	`amd64`	CPU architecture (`amd64` or `arm64`)
`karpenter.sh/capacity-type`	`spot`	Capacity type: `spot`, `on-demand`, or `reserved`
`karpenter.sh/nodepool`	`default`	NodePool that provisioned the node
`karpenter.k8s.aws/ec2nodeclass`	`default`	EC2NodeClass used to provision the node
`karpenter.k8s.aws/instance-category`	`g`	Instance category (string before generation number)
`karpenter.k8s.aws/instance-family`	`g4dn`	Instance family
`karpenter.k8s.aws/instance-size`	`8xlarge`	Instance size
`karpenter.k8s.aws/instance-cpu`	`32`	Number of vCPUs
`karpenter.k8s.aws/instance-memory`	`131072`	Memory in MiB
`karpenter.k8s.aws/instance-gpu-name`	`t4`	GPU name
`karpenter.k8s.aws/instance-gpu-count`	`1`	Number of GPUs
`karpenter.k8s.aws/instance-gpu-memory`	`16384`	GPU memory in MiB
`karpenter.k8s.aws/instance-local-nvme`	`900`	Local NVMe storage in GiB
`karpenter.k8s.aws/instance-network-bandwidth`	`131072`	Baseline network bandwidth in Mbps
`karpenter.k8s.aws/instance-hypervisor`	`nitro`	Hypervisor type
`karpenter.k8s.aws/instance-generation`	`4`	Generation number
`karpenter.k8s.aws/instance-tenancy`	`default`	Tenancy: `default` or `dedicated`
`topology.k8s.aws/zone-id`	`use1-az1`	Globally consistent zone ID

Karpenter extends standard Kubernetes operators with Gte (>=) and Lte (<=) for numeric label comparisons, such as karpenter.k8s.aws/instance-cpu or karpenter.k8s.aws/instance-memory.

Node affinity

Use nodeAffinity for more complex constraints than nodeSelector supports.

requiredDuringSchedulingIgnoredDuringExecution — hard requirement; pod won’t schedule if unmet
preferredDuringSchedulingIgnoredDuringExecution — soft preference; pod may still schedule if unmet

affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
        - matchExpressions:
          - key: "topology.kubernetes.io/zone"
            operator: "In"
            values: ["us-west-2a", "us-west-2b"]

Use multiple nodeSelectorTerms as OR conditions:

affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
        - matchExpressions:  # OR
          - key: "topology.kubernetes.io/zone"  # AND
            operator: "In"
            values: ["us-west-2a", "us-west-2b"]
          - key: "topology.kubernetes.io/zone"  # AND
            operator: "NotIn"
            values: ["us-west-2b"]
        - matchExpressions:  # OR
          - key: "karpenter.sh/capacity-type"  # AND
            operator: "In"
            values: ["spot"]
          - key: "topology.kubernetes.io/zone"  # AND
            operator: "In"
            values: ["us-west-2d"]

Karpenter tries each nodeSelectorTerms in order and uses the first that works. If all fail, it backs off and retries.

Preferred affinities can cause more nodes to be created than expected, because Karpenter prefers to create new nodes to satisfy preferences. Use required affinities when strict placement is needed.

Taints and tolerations

Taints prevent pods from scheduling on a node unless the pod tolerates the taint. NodePool with a GPU taint:

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: gpu
spec:
  template:
    spec:
      requirements:
      - key: karpenter.k8s.aws/instance-family
        operator: In
        values:
          - p3
      taints:
      - key: nvidia.com/gpu
        value: "true"
        effect: "NoSchedule"

Pod tolerating the taint:

apiVersion: v1
kind: Pod
metadata:
  name: mygpupod
spec:
  containers:
  - name: gpuapp
    resources:
      requests:
        nvidia.com/gpu: 1
      limits:
        nvidia.com/gpu: 1
    image: mygpucontainer
  tolerations:
  - key: "nvidia.com/gpu"
    operator: "Exists"
    effect: "NoSchedule"

Topology spread constraints

Use topologySpreadConstraints to distribute pods across failure domains and limit blast radius.

spec:
  topologySpreadConstraints:
    - maxSkew: 1
      topologyKey: "topology.kubernetes.io/zone"
      whenUnsatisfiable: ScheduleAnyway
      labelSelector:
        matchLabels:
          dev: jjones
    - maxSkew: 1
      topologyKey: "kubernetes.io/hostname"
      whenUnsatisfiable: ScheduleAnyway
      labelSelector:
        matchLabels:
          dev: jjones
    - maxSkew: 1
      topologyKey: "karpenter.sh/capacity-type"
      whenUnsatisfiable: ScheduleAnyway
      labelSelector:
        matchLabels:
          dev: jjones

Karpenter supports three topology keys for spread constraints:

topology.kubernetes.io/zone
kubernetes.io/hostname
karpenter.sh/capacity-type

NodePools do not automatically balance or rebalance nodes across availability zones. Achieve AZ balance by defining zonal topology spread constraints on pods.

Pod affinity and anti-affinity

Use podAffinity and podAntiAffinity to control scheduling relative to other pods.

spec:
  affinity:
    podAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: system
            operator: In
            values:
            - backend
        topologyKey: topology.kubernetes.io/zone
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchLabels:
            app: inflate
        topologyKey: kubernetes.io/hostname

This example schedules pods only in zones where a system=backend pod is running, and prevents more than one pod with app=inflate per node.

Weighted NodePools

Assign a .spec.weight to NodePools to control priority. Karpenter attempts to schedule using the highest-weight NodePool first.

Reserved capacity prioritization

To prioritize savings plan or reserved instances:

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: reserved-instance
spec:
  weight: 50
  limits:
    cpu: 100
  template:
    spec:
      requirements:
      - key: "node.kubernetes.io/instance-type"
        operator: In
        values: ["c4.large"]
---
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: default
spec:
  template:
    spec:
      requirements:
      - key: karpenter.sh/capacity-type
        operator: In
        values: ["spot", "on-demand"]
      - key: kubernetes.io/arch
        operator: In
        values: ["amd64"]

Fallback NodePool

Assign a higher weight to a NodePool with specific constraints to make it the cluster default:

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: default
spec:
  weight: 50
  template:
    spec:
      requirements:
      - key: karpenter.sh/capacity-type
        operator: In
        values: ["spot", "on-demand"]
      - key: kubernetes.io/arch
        operator: In
        values: ["amd64"]

Karpenter does not guarantee it will always choose the highest-priority NodePool. If a pod can’t be scheduled with the highest-priority NodePool, a lower-priority NodePool may be used for that batch.

Advanced scheduling techniques

Scheduling by node resources

Use well-known labels to require specific hardware capabilities.Require any local NVMe storage:

affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
        - matchExpressions:
          - key: "karpenter.k8s.aws/instance-local-nvme"
            operator: "Exists"

Require at least 100 GiB of NVMe storage:

affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
        - matchExpressions:
          - key: "karpenter.k8s.aws/instance-local-nvme"
            operator: Gte
            values: ["100"]

Require at least 50 Gbps network bandwidth:

affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
        - matchExpressions:
          - key: "karpenter.k8s.aws/instance-network-bandwidth"
            operator: Gte
            values: ["50000"]

Workload segregation with Exists operator

Isolate pods on different nodes without creating a unique NodePool per team:

apiVersion: karpenter.sh/v1
kind: NodePool
spec:
  template:
    spec:
      requirements:
        - key: company.com/team
          operator: Exists

Each team’s deployment selects its own label value:

# Team A
spec:
  template:
    spec:
      nodeSelector:
        company.com/team: team-a

# Team B
spec:
  template:
    spec:
      nodeSelector:
        company.com/team: team-b

Karpenter applies the label dynamically to launched nodes based on pod requirements.

On-demand/spot ratio split

Create virtual topology domains to achieve a desired spot-to-on-demand ratio.With four spot values and one on-demand value, spreading evenly gives a 4:1 ratio:

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: spot
spec:
  template:
    spec:
      requirements:
      - key: "karpenter.sh/capacity-type"
        operator: In
        values: ["spot"]
      - key: capacity-spread
        operator: In
        values: ["2", "3", "4", "5"]
---
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: on-demand
spec:
  template:
    spec:
      requirements:
      - key: "karpenter.sh/capacity-type"
        operator: In
        values: ["on-demand"]
      - key: capacity-spread
        operator: In
        values: ["1"]

Workload topology spread:

topologySpreadConstraints:
- maxSkew: 1
  topologyKey: capacity-spread
  whenUnsatisfiable: DoNotSchedule
  labelSelector:
    matchLabels:
      app: myapp

Persistent volume topology

Karpenter automatically detects storage scheduling requirements and includes them in node launch decisions.

apiVersion: v1
kind: Pod
metadata:
  name: app
spec:
  containers: []
  volumes:
    - name: storage
      persistentVolumeClaim:
        claimName: ebs-claim
---
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: ebs
provisioner: ebs.csi.aws.com
volumeBindingMode: WaitForFirstConsumer
allowedTopologies:
- matchLabelExpressions:
  - key: topology.ebs.csi.aws.com/zone
    values: ["us-west-2a", "us-west-2b"]
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: ebs-claim
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: ebs
  resources:
    requests:
      storage: 4Gi

The EBS CSI driver uses topology.ebs.csi.aws.com/zone instead of topology.kubernetes.io/zone. Karpenter translates between these labels internally. When configuring a StorageClass for the EBS CSI driver, use topology.ebs.csi.aws.com/zone.

Get Started

Concepts

Guides

Reference

Help

Layered constraint model

Resource requests

Accelerator and GPU resources

Node selectors and labels

Well-known labels

Node affinity

Taints and tolerations

Topology spread constraints

Pod affinity and anti-affinity

Weighted NodePools

Reserved capacity prioritization

Fallback NodePool

Advanced scheduling techniques

Build docs developers (and LLMs) love

Get Started

Concepts

Guides

Reference

Help

Documentation Index

​Layered constraint model

​Resource requests

​Accelerator and GPU resources

​Node selectors and labels

​Well-known labels

​Node affinity

​Taints and tolerations

​Topology spread constraints

​Pod affinity and anti-affinity

​Weighted NodePools

​Reserved capacity prioritization

​Fallback NodePool

​Advanced scheduling techniques

Build docs developers (and LLMs) love

Layered constraint model

Resource requests

Accelerator and GPU resources

Node selectors and labels

Well-known labels

Node affinity

Taints and tolerations

Topology spread constraints

Pod affinity and anti-affinity

Weighted NodePools

Reserved capacity prioritization

Fallback NodePool

Advanced scheduling techniques