Skip to main content
Apache Spark provides a suite of web user interfaces (UIs) that you can use to monitor the status and resource consumption of your Spark cluster.

Accessing the Web UI

Every SparkContext launches a Web UI, by default on port 4040, that displays useful information about the application. You can access this interface by simply opening http://<driver-node>:4040 in a web browser. If multiple SparkContexts are running on the same host, they will bind to successive ports beginning with 4040 (4041, 4042, etc).
This information is only available for the duration of the application by default. To view the web UI after the fact, set spark.eventLog.enabled to true before starting the application.

Jobs Tab

The Jobs tab displays a summary page of all jobs in the Spark application and a details page for each job. The summary page shows high-level information, such as the status, duration, and progress of all jobs and the overall event timeline.

Summary Information

The Jobs tab displays:
  • User: Current Spark user
  • Started At: The startup time of Spark application
  • Total uptime: Time since Spark application started
  • Scheduling mode: See job scheduling configuration
  • Number of jobs per status: Active, Completed, Failed
  • Event timeline: Displays in chronological order the events related to the executors (added, removed) and the jobs
  • Details of jobs grouped by status: Displays detailed information of the jobs including Job ID, description (with a link to detailed job page), submitted time, duration, stages summary and tasks progress bar

Job Details

When you click on a specific job, you can see detailed information including:
  • Job Status: Running, succeeded, or failed
  • Number of stages per status: Active, pending, completed, skipped, failed
  • Associated SQL Query: Link to the SQL tab for this job
  • Event timeline: Chronological display of executor and stage events
  • DAG visualization: Visual representation of the directed acyclic graph where vertices represent the RDDs or DataFrames and edges represent operations

Stage Information

For each stage, you can view:
  • Stage ID
  • Description of the stage
  • Submitted timestamp
  • Duration of the stage
  • Tasks progress bar
  • Input: Bytes read from storage in this stage
  • Output: Bytes written in storage in this stage
  • Shuffle read: Total shuffle bytes and records read, includes both data read locally and data read from remote executors
  • Shuffle write: Bytes and records written to disk in order to be read by a shuffle in a future stage

Stages Tab

The Stages tab displays a summary page that shows the current state of all stages of all jobs in the Spark application. At the beginning of the page is the summary with the count of all stages by status (active, pending, completed, skipped, and failed).

Stage Details

The stage detail page begins with information like:
  • Total time across all tasks
  • Locality level summary
  • Shuffle Read Size / Records
  • Associated Job IDs
  • DAG visualization of the stage

Summary Metrics

Summary metrics for all tasks are represented in a table and timeline:
  • Tasks deserialization time: Time to deserialize task data
  • Duration of tasks: Total execution time
  • GC time: Total JVM garbage collection time
  • Result serialization time: Time spent serializing the task result on an executor
  • Getting result time: Time the driver spends fetching task results from workers
  • Scheduler delay: Time the task waits to be scheduled for execution
  • Peak execution memory: Maximum memory used by internal data structures created during shuffles, aggregations and joins
  • Shuffle Read Size / Records: Total shuffle bytes read, includes both data read locally and remotely
  • Shuffle Read Fetch Wait Time: Time tasks spent blocked waiting for shuffle data
  • Shuffle Remote Reads: Total shuffle bytes read from remote executors
  • Shuffle Write Time: Time tasks spent writing shuffle data
  • Shuffle spill (memory): Size of the deserialized form of the shuffled data in memory
  • Shuffle spill (disk): Size of the serialized form of the data on disk
Aggregated metrics by executor show the same information aggregated by executor.

Accumulators

Accumulators are a type of shared variables that provide a mutable variable that can be updated inside of a variety of transformations. You can create accumulators with and without name, but only named accumulators are displayed.

Storage Tab

The Storage tab displays the persisted RDDs and DataFrames, if any, in the application. The summary page shows the storage levels, sizes and partitions of all RDDs, and the details page shows the sizes and using executors for all partitions in an RDD or DataFrame.
import org.apache.spark.storage.StorageLevel._

val rdd = sc.range(0, 100, 1, 5).setName("rdd")
rdd.persist(MEMORY_ONLY_SER)
rdd.count

val df = Seq((1, "andy"), (2, "bob"), (2, "andy")).toDF("count", "name")
df.persist(DISK_ONLY)
df.count
The newly persisted RDDs or DataFrames are not shown in the tab before they are materialized. To monitor a specific RDD or DataFrame, make sure an action operation has been triggered.
Basic information like storage level, number of partitions and memory overhead are provided. You can click the RDD name for obtaining the details of data persistence, such as the data distribution on the cluster.

Environment Tab

The Environment tab displays the values for the different environment and configuration variables, including JVM, Spark, and system properties. This environment page has five parts:
  1. Runtime Information: Runtime properties like versions of Java and Scala
  2. Spark Properties: Application properties like spark.app.name and spark.driver.memory
  3. Hadoop Properties: Properties relative to Hadoop and YARN
  4. System Properties: More details about the JVM
  5. Classpath Entries: Lists the classes loaded from different sources, which is very useful to resolve class conflicts
The Environment tab is a useful place to check whether your properties have been set correctly.

Executors Tab

The Executors tab displays summary information about the executors that were created for the application, including memory and disk usage and task and shuffle information.

Executor Information

The Executors tab provides:
  • Resource information: Amount of memory, disk, and cores used by each executor
  • Performance information: GC time and shuffle information
  • Storage Memory: Amount of memory used and reserved for caching data
  • Log access: Links to stderr, stdout logs
  • Thread dumps: View JVM thread dump for performance analysis

SQL Tab

If your application executes Spark SQL queries, the SQL tab displays information such as the duration, jobs, and physical and logical plans for the queries.

Query Information

For each query, you can view:
  • Query execution time and duration
  • List of associated jobs
  • Query execution DAG
  • Physical and logical plans

SQL Metrics

The metrics of SQL operators are shown in the block of physical operators. The SQL metrics can be useful when you want to dive into the execution details of each operator.

Output Metrics

  • Number of output rows
  • Data size

Scan Metrics

  • Scan time
  • Metadata time

Shuffle Metrics

  • Shuffle bytes written
  • Shuffle records written
  • Remote blocks read
  • Fetch wait time

Performance Metrics

  • Sort time
  • Peak memory
  • Spill size
Key SQL metrics include:
MetricDescriptionOperators
number of output rowsThe number of output rows of the operatorAggregate operators, Join operators, Sample, Range, Scan operators, Filter
data sizeThe size of broadcast/shuffled/collected dataBroadcastExchange, ShuffleExchange, Subquery
scan timeThe time spent on scanning dataColumnarBatchScan, FileSourceScan
shuffle bytes writtenThe number of bytes writtenCollectLimit, TakeOrderedAndProject, ShuffleExchange
fetch wait timeThe time spent on fetching data (local and remote)CollectLimit, TakeOrderedAndProject, ShuffleExchange
peak memoryThe peak memory usage in the operatorSort, HashAggregate
spill sizeNumber of bytes spilled to disk from memorySort, HashAggregate

Structured Streaming Tab

When running Structured Streaming jobs in micro-batch mode, a Structured Streaming tab will be available on the Web UI. The overview page displays brief statistics for running and completed queries.

Streaming Metrics

  • Input Rate: The aggregate (across all sources) rate of data arriving
  • Process Rate: The aggregate rate at which Spark is processing data
  • Input Rows: The aggregate number of records processed in a trigger
  • Batch Duration: The process duration of each batch
  • Operation Duration: The amount of time taken to perform various operations (addBatch, getBatch, latestOffset, queryPlanning, walCommit)
  • Global Watermark Gap: The gap between batch timestamp and global watermark
  • State Metrics: Aggregated state rows and memory usage

JDBC/ODBC Server Tab

You can see this tab when Spark is running as a distributed SQL engine. It shows information about sessions and submitted SQL operations.

Session Information

  • User and IP of the connection
  • Session ID link to access session info
  • Start time, finish time and duration of the session
  • Total execute is the number of operations submitted in this session

SQL Statistics

  • User that submitted the operation
  • Job ID link to Jobs tab
  • Group ID of the query that groups all jobs together
  • Start time, finish time, close time
  • Execution time and duration time
  • Statement being executed
  • State of the process (Started, Compiled, Failed, Canceled, Finished, Closed)
  • Detail of the execution plan or errors

Build docs developers (and LLMs) love