Multiple Schedulers

Concept

By default, Kubernetes uses the default scheduler to distribute pods across nodes evenly. In some cases you may want to set up your own scheduling algorithm or apply custom conditions for placing pods on nodes. Kubernetes allows you to write and deploy your own scheduler as an additional scheduler alongside the default one. You can then direct specific pods to use your custom scheduler while other pods continue to use the default. The default scheduler configuration is found at /etc/kubernetes/manifests/kube-scheduler.yaml on the master node.

scheduler-config.yaml

apiVersion: kubescheduler.config.k8s.io/v1
kind: KubeSchedulerConfiguration
profiles:
  - schedulerName: default-scheduler # must be unique in the cluster

Steps to set up and use multiple schedulers

Create a new scheduler configuration file

/etc/kubernetes/my-new-scheduler.yaml

apiVersion: kubescheduler.config.k8s.io/v1
kind: KubeSchedulerConfiguration
profiles:
  - schedulerName: my-new-scheduler
leaderElection:
  leaderElect: true
  resourceNamespace: kube-system
  resourceName: lock-object-my-scheduler

leaderElect — Ensures only one instance of the scheduler is active at a time. Required for high-availability setups with multiple master nodes.
resourceName — The name of the resource object used for leader election. Prevents conflicts between multiple scheduler instances.

Deploy the additional scheduler

You may deploy the scheduler as a Pod or as a Deployment.

Deploy as a Pod
Deploy as a Deployment

my-new-scheduler.yaml

apiVersion: v1
kind: Pod
metadata:
  name: my-new-scheduler
  namespace: kube-system
spec:
  containers:
    - name: my-new-scheduler
      image: k8s.gcr.io/kube-scheduler:v1.22.0
      command:
        - kube-scheduler
        - --kubeconfig=/etc/kubernetes/scheduler.conf
        - --config=/etc/kubernetes/my-new-scheduler.yaml
      volumeMounts:
        - name: kubeconfig
          mountPath: /etc/kubernetes
          readOnly: true
  volumes:
    - name: kubeconfig
      hostPath:
        path: /etc/kubernetes

kubectl apply -f my-new-scheduler.yaml

1. Package the scheduler binary

git clone https://github.com/kubernetes/kubernetes.git
cd kubernetes
make

2. Build a container image

Dockerfile

FROM busybox
ADD ./_output/local/bin/linux/amd64/kube-scheduler /usr/local/bin/kube-scheduler

docker build -t gcr.io/my-gcp-project/my-kube-scheduler:1.0 .
gcloud docker -- push gcr.io/my-gcp-project/my-kube-scheduler:1.0

3. Create the Deployment manifest

my-new-scheduler.yaml

apiVersion: v1
kind: ServiceAccount
metadata:
  name: my-scheduler
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: my-scheduler-as-kube-scheduler
subjects:
- kind: ServiceAccount
  name: my-scheduler
  namespace: kube-system
roleRef:
  kind: ClusterRole
  name: system:kube-scheduler
  apiGroup: rbac.authorization.k8s.io
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: my-scheduler-as-volume-scheduler
subjects:
- kind: ServiceAccount
  name: my-scheduler
  namespace: kube-system
roleRef:
  kind: ClusterRole
  name: system:volume-scheduler
  apiGroup: rbac.authorization.k8s.io
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: my-scheduler-extension-apiserver-authentication-reader
  namespace: kube-system
roleRef:
  kind: Role
  name: extension-apiserver-authentication-reader
  apiGroup: rbac.authorization.k8s.io
subjects:
- kind: ServiceAccount
  name: my-scheduler
  namespace: kube-system
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: my-scheduler-config
  namespace: kube-system
data:
  my-scheduler-config.yaml: |
    apiVersion: kubescheduler.config.k8s.io/v1beta2
    kind: KubeSchedulerConfiguration
    profiles:
      - schedulerName: my-scheduler
    leaderElection:
      leaderElect: false
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    component: scheduler
    tier: control-plane
  name: my-scheduler
  namespace: kube-system
spec:
  selector:
    matchLabels:
      component: scheduler
      tier: control-plane
  replicas: 1
  template:
    metadata:
      labels:
        component: scheduler
        tier: control-plane
        version: second
    spec:
      serviceAccountName: my-scheduler
      containers:
      - command:
        - /usr/local/bin/kube-scheduler
        - --config=/etc/kubernetes/my-scheduler/my-scheduler-config.yaml
        image: gcr.io/my-gcp-project/my-kube-scheduler:1.0
        livenessProbe:
          httpGet:
            path: /healthz
            port: 10259
            scheme: HTTPS
          initialDelaySeconds: 15
        name: kube-second-scheduler
        readinessProbe:
          httpGet:
            path: /healthz
            port: 10259
            scheme: HTTPS
        resources:
          requests:
            cpu: '0.1'
        securityContext:
          privileged: false
        volumeMounts:
          - name: config-volume
            mountPath: /etc/kubernetes/my-scheduler
      hostNetwork: false
      hostPID: false
      volumes:
        - name: config-volume
          configMap:
            name: my-scheduler-config

kubectl apply -f my-new-scheduler.yaml

Verify the new scheduler is running

kubectl get pods -n kube-system
# output
NAME                                           READY     STATUS    RESTARTS   AGE
....
my-scheduler-lnf4s-4744f                       1/1       Running   0          2m
...

Create a pod with the new scheduler

Use the schedulerName field in the pod spec to direct a pod to your custom scheduler.

pod.yaml

apiVersion: v1
kind: Pod
metadata:
  name: sample-pod
spec:
  containers:
    - name: sample-pod
      image: ubuntu
  schedulerName: my-new-scheduler

Verify the pod was scheduled by the correct scheduler:

kubectl get events -o wide
# output
LAST SEEN   TYPE     REASON      OBJECT       SOURCE                                               MESSAGE
10s         Normal   Scheduled   pod/ubuntu   custom-scheduler, custom-scheduler-kind-cluster-...  Successfully assigned default/ubuntu to kind-cluster-control-plane

Scheduler priority and plugins

Pods are sorted in the scheduling queue based on their priority. To set a priority, create a PriorityClass and reference it in the pod spec.

priority-class.yaml

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: high-priority
value: 1000000 # higher value = higher priority
globalDefault: false
description: "This priority class should be used for XYZ service pods only."

sample-pod.yaml

apiVersion: v1
kind: Pod
metadata:
  name: sample-pod
spec:
  priorityClassName: high-priority
  containers:
    - name: sample
      image: ubuntu

Scheduling phases

Every pod goes through four phases when being scheduled:

Scheduling queue

Pods with higher priority values are placed at the beginning of the queue.

Filtering

The scheduler checks nodes against node selector and affinity rules, and verifies that each node has sufficient CPU and memory.

Scoring

The scheduler scores the remaining nodes based on available resources. The node with the highest score is selected.

Binding

The pod is bound to the highest-scoring node.

Scheduler plugins

Each scheduling phase has its own plugins:

Scheduling queue plugins

PrioritySort — Sorts pods based on their priority value.

Filtering plugins

NodeResourcesFit — Identifies nodes with enough resources to run the pod.
NodeName — Checks whether the pod specifies a particular node name in its spec.
NodeUnschedulable — Filters out nodes marked as unschedulable.

Scoring plugins

NodeResourcesFit — Scores nodes based on available resources. A single plugin can be used in multiple phases.
ImageLocality — Scores nodes higher if they already have the container image cached. If no node has the image cached, the pod is still placed on an available node.

Binding plugins

DefaultBinder — Binds the pod to the selected node.

You can write your own plugins to extend scheduler functionality using extension points.

Scheduling profiles

Running separate schedulers as separate processes can lead to a race condition where schedulers make conflicting decisions. Use scheduling profiles to configure multiple schedulers within a single process.

scheduler-config.yaml

apiVersion: kubescheduler.config.k8s.io/v1
kind: KubeSchedulerConfiguration
profiles:
  - schedulerName: default-scheduler
  - schedulerName: my-new-scheduler-1
    plugins:
      score:
        disabled:
          - name: TaintToleration
        enabled:
          - name: CustomPlugin1
          - name: CustomPlugin2
          - name: CustomPlugin3
  - schedulerName: no-scoring-scheduler
    plugins:
      preScore:
        disabled:
        - name: '*'
      score:
        disabled:
        - name: '*'

Overview

Docker

Git

Kubernetes

Kubernetes Scheduling

Kubernetes Observability

Kubernetes Cluster Maintenance

Kubernetes Security

Linux

SSH

Ansible

Taskfile

Python OOP

Data Structures & Algorithms

12-Factor App

Multiple Schedulers

Concept

Steps to set up and use multiple schedulers

Scheduler priority and plugins

Scheduling phases

Scheduler plugins

Scheduling profiles

Build docs developers (and LLMs) love

Overview

Docker

Git

Kubernetes

Kubernetes Scheduling

Kubernetes Observability

Kubernetes Cluster Maintenance

Kubernetes Security

Linux

SSH

Ansible

Taskfile

Python OOP

Data Structures & Algorithms

12-Factor App

​Concept

​Steps to set up and use multiple schedulers

​Scheduler priority and plugins

​Scheduling phases

​Scheduler plugins

​Scheduling profiles

Build docs developers (and LLMs) love

Concept

Steps to set up and use multiple schedulers

Scheduler priority and plugins

Scheduling phases

Scheduler plugins

Scheduling profiles