Skip to main content

Concept

By default, Kubernetes uses the default scheduler to distribute pods across nodes evenly. In some cases you may want to set up your own scheduling algorithm or apply custom conditions for placing pods on nodes. Kubernetes allows you to write and deploy your own scheduler as an additional scheduler alongside the default one. You can then direct specific pods to use your custom scheduler while other pods continue to use the default. The default scheduler configuration is found at /etc/kubernetes/manifests/kube-scheduler.yaml on the master node.
scheduler-config.yaml
apiVersion: kubescheduler.config.k8s.io/v1
kind: KubeSchedulerConfiguration
profiles:
  - schedulerName: default-scheduler # must be unique in the cluster

Steps to set up and use multiple schedulers

1

Create a new scheduler configuration file

/etc/kubernetes/my-new-scheduler.yaml
apiVersion: kubescheduler.config.k8s.io/v1
kind: KubeSchedulerConfiguration
profiles:
  - schedulerName: my-new-scheduler
leaderElection:
  leaderElect: true
  resourceNamespace: kube-system
  resourceName: lock-object-my-scheduler
  • leaderElect — Ensures only one instance of the scheduler is active at a time. Required for high-availability setups with multiple master nodes.
  • resourceName — The name of the resource object used for leader election. Prevents conflicts between multiple scheduler instances.
2

Deploy the additional scheduler

You may deploy the scheduler as a Pod or as a Deployment.
my-new-scheduler.yaml
apiVersion: v1
kind: Pod
metadata:
  name: my-new-scheduler
  namespace: kube-system
spec:
  containers:
    - name: my-new-scheduler
      image: k8s.gcr.io/kube-scheduler:v1.22.0
      command:
        - kube-scheduler
        - --kubeconfig=/etc/kubernetes/scheduler.conf
        - --config=/etc/kubernetes/my-new-scheduler.yaml
      volumeMounts:
        - name: kubeconfig
          mountPath: /etc/kubernetes
          readOnly: true
  volumes:
    - name: kubeconfig
      hostPath:
        path: /etc/kubernetes
kubectl apply -f my-new-scheduler.yaml
3

Verify the new scheduler is running

kubectl get pods -n kube-system
# output
NAME                                           READY     STATUS    RESTARTS   AGE
....
my-scheduler-lnf4s-4744f                       1/1       Running   0          2m
...
4

Create a pod with the new scheduler

Use the schedulerName field in the pod spec to direct a pod to your custom scheduler.
pod.yaml
apiVersion: v1
kind: Pod
metadata:
  name: sample-pod
spec:
  containers:
    - name: sample-pod
      image: ubuntu
  schedulerName: my-new-scheduler
Verify the pod was scheduled by the correct scheduler:
kubectl get events -o wide
# output
LAST SEEN   TYPE     REASON      OBJECT       SOURCE                                               MESSAGE
10s         Normal   Scheduled   pod/ubuntu   custom-scheduler, custom-scheduler-kind-cluster-...  Successfully assigned default/ubuntu to kind-cluster-control-plane

Scheduler priority and plugins

Pods are sorted in the scheduling queue based on their priority. To set a priority, create a PriorityClass and reference it in the pod spec.
priority-class.yaml
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: high-priority
value: 1000000 # higher value = higher priority
globalDefault: false
description: "This priority class should be used for XYZ service pods only."
sample-pod.yaml
apiVersion: v1
kind: Pod
metadata:
  name: sample-pod
spec:
  priorityClassName: high-priority
  containers:
    - name: sample
      image: ubuntu

Scheduling phases

Every pod goes through four phases when being scheduled:
1

Scheduling queue

Pods with higher priority values are placed at the beginning of the queue.
2

Filtering

The scheduler checks nodes against node selector and affinity rules, and verifies that each node has sufficient CPU and memory.
3

Scoring

The scheduler scores the remaining nodes based on available resources. The node with the highest score is selected.
4

Binding

The pod is bound to the highest-scoring node.

Scheduler plugins

Each scheduling phase has its own plugins:
  • PrioritySort — Sorts pods based on their priority value.
  • NodeResourcesFit — Identifies nodes with enough resources to run the pod.
  • NodeName — Checks whether the pod specifies a particular node name in its spec.
  • NodeUnschedulable — Filters out nodes marked as unschedulable.
  • NodeResourcesFit — Scores nodes based on available resources. A single plugin can be used in multiple phases.
  • ImageLocality — Scores nodes higher if they already have the container image cached. If no node has the image cached, the pod is still placed on an available node.
  • DefaultBinder — Binds the pod to the selected node.
You can write your own plugins to extend scheduler functionality using extension points.

Scheduling profiles

Running separate schedulers as separate processes can lead to a race condition where schedulers make conflicting decisions. Use scheduling profiles to configure multiple schedulers within a single process.
scheduler-config.yaml
apiVersion: kubescheduler.config.k8s.io/v1
kind: KubeSchedulerConfiguration
profiles:
  - schedulerName: default-scheduler
  - schedulerName: my-new-scheduler-1
    plugins:
      score:
        disabled:
          - name: TaintToleration
        enabled:
          - name: CustomPlugin1
          - name: CustomPlugin2
          - name: CustomPlugin3
  - schedulerName: no-scoring-scheduler
    plugins:
      preScore:
        disabled:
        - name: '*'
      score:
        disabled:
        - name: '*'

Build docs developers (and LLMs) love