Overview
Spark schedules resources at two levels:- Across applications - At the level of the cluster manager
- Within applications - When multiple jobs run concurrently in the same SparkContext
Within each Spark application, multiple “jobs” (Spark actions) may be running concurrently if they were submitted by different threads. This is common if your application is serving requests over the network.
Scheduling Across Applications
When running on a cluster, each Spark application gets an independent set of executor JVMs that only run tasks and store data for that application.Static Partitioning
The simplest option, available on all cluster managers, is static partitioning of resources. With this approach, each application is given a maximum amount of resources it can use and holds onto them for its whole duration.- Standalone Mode
- YARN Mode
- Kubernetes Mode
By default, applications run in FIFO (first-in-first-out) order, and each application tries to use all available nodes.Configuration:
spark.cores.max- Maximum cores for this applicationspark.deploy.defaultCores- Default cores for applications that don’t set spark.cores.maxspark.executor.memory- Memory per executor
Dynamic Resource Allocation
Spark provides a mechanism to dynamically adjust the resources your application occupies based on the workload.Your application may give resources back to the cluster if they are no longer used and request them again later when there is demand. This feature is particularly useful if multiple applications share resources in your Spark cluster.
Enabling Dynamic Allocation
To use this feature, your application must setspark.dynamicAllocation.enabled to true. Additionally, you must configure one of the following:
- Set up an external shuffle service and enable
spark.shuffle.service.enabled - Enable shuffle tracking with
spark.dynamicAllocation.shuffleTracking.enabled - Enable decommission with both
spark.decommission.enabledandspark.storage.decommission.shuffleBlocks.enabled
Configuration
Resource Allocation Policy
Request Policy
Spark requests additional executors when it has pending tasks waiting to be scheduled.Initial Request
When there have been pending tasks for
spark.dynamicAllocation.schedulerBacklogTimeout seconds, Spark requests 1 additional executor.Exponential Growth
Subsequent requests are triggered every
spark.dynamicAllocation.sustainedSchedulerBacklogTimeout seconds, requesting 2, 4, 8, and so on executors.Remove Policy
Spark removes an executor when it has been idle for more thanspark.dynamicAllocation.executorIdleTimeout seconds.
External Shuffle Service
The external shuffle service is a long-running process that preserves shuffle files written by executors beyond their lifetime. Setup by cluster manager:- Standalone: Start workers with
spark.shuffle.service.enabledset totrue - YARN: Follow the YARN-specific configuration instructions
- Kubernetes: Currently not supported on K8s; use shuffle tracking instead
Scheduling Within an Application
Inside a given Spark application (SparkContext instance), multiple parallel jobs can run simultaneously if they were submitted from separate threads.FIFO Scheduler (Default)
By default, Spark’s scheduler runs jobs in FIFO fashion:- Each job is divided into “stages” (e.g., map and reduce phases)
- The first job gets priority on all available resources
- If the jobs at the head don’t need the whole cluster, later jobs can start right away
Fair Scheduler
Under fair sharing, Spark assigns tasks between jobs in a “round robin” fashion, so that all jobs get a roughly equal share of cluster resources.Fair scheduler is best for multi-user settings where short jobs submitted while a long job is running can start receiving resources right away and still get good response times.
Enable Fair Scheduler
Fair Scheduler Pools
The fair scheduler supports grouping jobs into pools and setting different scheduling options for each pool.Assigning Jobs to Pools
Set thespark.scheduler.pool local property in the thread submitting jobs:
Configuring Pool Properties
Create an XML configuration file to define pool properties:fairscheduler.xml
| Property | Description |
|---|---|
schedulingMode | FIFO or FAIR - controls whether jobs within the pool queue up behind each other (default) or share the pool’s resources fairly. |
weight | Controls the pool’s share of the cluster relative to other pools. By default, all pools have a weight of 1. Setting weight=2 gives 2x more resources. |
minShare | Minimum shares (as CPU cores) that the administrator would like the pool to have. The fair scheduler always attempts to meet all active pools’ minimum shares before redistributing extra resources. |
Load Configuration File
Place a file named
fairscheduler.xml on the classpath, or set the spark.scheduler.allocation.file property to specify a custom location.Default Pool Behavior
Without any intervention, newly submitted jobs go into a default pool:- Each pool gets an equal share of the cluster
- Jobs within the default pool run in FIFO order
- If you create one pool per user, each user gets an equal share of the cluster
Scheduling for JDBC Connections
For JDBC client sessions, you can set the Fair Scheduler pool:Concurrent Jobs in PySpark
Usepyspark.InheritableThread for a Python thread to inherit inheritable attributes such as local properties in a JVM thread:
Best Practices
Choose the Right Scheduler
Choose the Right Scheduler
- Use FIFO for batch processing and single-user applications
- Use Fair scheduler for multi-user environments and interactive workloads
Configure Dynamic Allocation
Configure Dynamic Allocation
Enable dynamic allocation for shared clusters to improve resource utilization. Use shuffle tracking for simpler setup without external shuffle service.
Set Appropriate Pool Weights
Set Appropriate Pool Weights
Configure pool weights based on priority: high-priority pools should have higher weights (e.g., 1000) to ensure they get resources first.
Monitor Resource Usage
Monitor Resource Usage
Use the Spark UI to monitor job execution, resource allocation, and identify bottlenecks.
Example: Multi-Tenant Application
Next Steps
Cluster Overview
Learn about cluster deployment modes
Performance Tuning
Optimize Spark application performance
