Concept
By default, Kubernetes uses the default scheduler to distribute pods across nodes evenly. In some cases you may want to set up your own scheduling algorithm or apply custom conditions for placing pods on nodes. Kubernetes allows you to write and deploy your own scheduler as an additional scheduler alongside the default one. You can then direct specific pods to use your custom scheduler while other pods continue to use the default. The default scheduler configuration is found at/etc/kubernetes/manifests/kube-scheduler.yaml on the master node.
scheduler-config.yaml
Steps to set up and use multiple schedulers
Create a new scheduler configuration file
/etc/kubernetes/my-new-scheduler.yaml
leaderElect— Ensures only one instance of the scheduler is active at a time. Required for high-availability setups with multiple master nodes.resourceName— The name of the resource object used for leader election. Prevents conflicts between multiple scheduler instances.
Deploy the additional scheduler
You may deploy the scheduler as a Pod or as a Deployment.
- Deploy as a Pod
- Deploy as a Deployment
my-new-scheduler.yaml
Scheduler priority and plugins
Pods are sorted in the scheduling queue based on their priority. To set a priority, create aPriorityClass and reference it in the pod spec.
priority-class.yaml
sample-pod.yaml
Scheduling phases
Every pod goes through four phases when being scheduled:Filtering
The scheduler checks nodes against node selector and affinity rules, and verifies that each node has sufficient CPU and memory.
Scoring
The scheduler scores the remaining nodes based on available resources. The node with the highest score is selected.
Scheduler plugins
Each scheduling phase has its own plugins:Scheduling queue plugins
Scheduling queue plugins
- PrioritySort — Sorts pods based on their priority value.
Filtering plugins
Filtering plugins
- NodeResourcesFit — Identifies nodes with enough resources to run the pod.
- NodeName — Checks whether the pod specifies a particular node name in its spec.
- NodeUnschedulable — Filters out nodes marked as unschedulable.
Scoring plugins
Scoring plugins
- NodeResourcesFit — Scores nodes based on available resources. A single plugin can be used in multiple phases.
- ImageLocality — Scores nodes higher if they already have the container image cached. If no node has the image cached, the pod is still placed on an available node.
Binding plugins
Binding plugins
- DefaultBinder — Binds the pod to the selected node.
Scheduling profiles
Running separate schedulers as separate processes can lead to a race condition where schedulers make conflicting decisions. Use scheduling profiles to configure multiple schedulers within a single process.scheduler-config.yaml