Spark History Server
It is possible to construct the UI of an application through Spark’s history server, provided that the application’s event logs exist.Starting the History Server
You can start the history server by executing:http://<server-url>:18080 by default, listing incomplete and completed applications and attempts.
Configuring Event Logging
To view the web UI after an application has finished, you need to configure event logging:spark.history.fs.logDirectory configuration option, and should contain sub-directories that each represents an application’s event logs.
Environment Variables
You can configure the history server using these environment variables:| Environment Variable | Description |
|---|---|
SPARK_DAEMON_MEMORY | Memory to allocate to the history server (default: 1g) |
SPARK_DAEMON_JAVA_OPTS | JVM options for the history server (default: none) |
SPARK_DAEMON_CLASSPATH | Classpath for the history server (default: none) |
SPARK_PUBLIC_DNS | The public address for the history server |
SPARK_HISTORY_OPTS | spark.history.* configuration options for the history server |
Configuration Options
Basic Configuration
Basic Configuration
| Property | Default | Description |
|---|---|---|
spark.history.provider | org.apache.spark.deploy.history.FsHistoryProvider | Name of the class implementing the application history backend |
spark.history.fs.logDirectory | file:/tmp/spark-events | Directory containing application event logs to load |
spark.history.fs.update.interval | 10s | Period at which the filesystem history provider checks for new or updated logs |
spark.history.ui.port | 18080 | Port to which the web interface binds |
spark.history.retainedApplications | 50 | Number of applications to retain UI data for in the cache |
Kerberos Configuration
Kerberos Configuration
| Property | Default | Description |
|---|---|---|
spark.history.kerberos.enabled | false | Whether the history server should use kerberos to login |
spark.history.kerberos.principal | (none) | Kerberos principal name for the History Server |
spark.history.kerberos.keytab | (none) | Location of the kerberos keytab file |
Cleaner Configuration
Cleaner Configuration
| Property | Default | Description |
|---|---|---|
spark.history.fs.cleaner.enabled | false | Whether the History Server should periodically clean up event logs |
spark.history.fs.cleaner.interval | 1d | How often the filesystem job history cleaner checks for files to delete |
spark.history.fs.cleaner.maxAge | 7d | Job history files older than this will be deleted |
spark.history.fs.cleaner.maxNum | Int.MaxValue | Maximum number of files in the event log directory |
Storage Configuration
Storage Configuration
| Property | Default | Description |
|---|---|---|
spark.history.store.maxDiskUsage | 10g | Maximum disk usage for the local directory where cache application history information is stored |
spark.history.store.path | (none) | Local directory where to cache application history data |
spark.history.store.serializer | JSON | Serializer for writing/reading in-memory UI objects (JSON or PROTOBUF) |
Event Log Compaction
A long-running application can generate a huge single event log file which may cost a lot to maintain. Enablingspark.eventLog.rolling.enabled and spark.eventLog.rolling.maxFileSize creates rolling event log files instead.
The History Server can apply compaction on rolling event log files to reduce the overall size of logs via spark.history.fs.eventLog.rolling.maxFilesToRetain.
The compaction tries to exclude events which point to outdated data:
- Events for jobs that have finished, and related stage/tasks events
- Events for executors that have terminated
- Events for SQL executions that have finished, and related job/stage/tasks events
REST API
In addition to viewing the metrics in the UI, they are also available as JSON. This gives you an easy way to create new visualizations and monitoring tools for Spark.API Access
The endpoints are mounted at/api/v1. For example:
- History server:
http://<server-url>:18080/api/v1 - Running application:
http://localhost:4040/api/v1
API Endpoints
Applications
/applications- List all applications/applications/[app-id]/jobs- All jobs for an application/applications/[app-id]/stages- All stages for an application
Executors
/applications/[app-id]/executors- Active executors/applications/[app-id]/allexecutors- All executors/applications/[app-id]/executors/[executor-id]/threads- Thread dumps
Storage
/applications/[app-id]/storage/rdd- Stored RDDs/applications/[app-id]/storage/rdd/[rdd-id]- RDD storage status
SQL
/applications/[app-id]/sql- All queries/applications/[app-id]/sql/[execution-id]- Query details
Executor Task Metrics
The REST API exposes Task Metrics collected by Spark executors with the granularity of task execution:| Metric | Description |
|---|---|
executorRunTime | Elapsed time the executor spent running this task (milliseconds) |
executorCpuTime | CPU time the executor spent running this task (nanoseconds) |
executorDeserializeTime | Elapsed time spent to deserialize this task (milliseconds) |
jvmGCTime | Elapsed time the JVM spent in garbage collection (milliseconds) |
resultSerializationTime | Elapsed time spent serializing the task result (milliseconds) |
memoryBytesSpilled | Number of in-memory bytes spilled by this task |
diskBytesSpilled | Number of on-disk bytes spilled by this task |
peakExecutionMemory | Peak memory used by internal data structures during shuffles, aggregations and joins |
inputMetrics.bytesRead | Total number of bytes read |
inputMetrics.recordsRead | Total number of records read |
outputMetrics.bytesWritten | Total number of bytes written |
shuffleReadMetrics.remoteBytesRead | Number of remote bytes read in shuffle operations |
shuffleReadMetrics.fetchWaitTime | Time spent waiting for remote shuffle blocks (milliseconds) |
shuffleWriteMetrics.bytesWritten | Number of bytes written in shuffle operations |
shuffleWriteMetrics.writeTime | Time spent blocking on writes to disk or buffer cache (nanoseconds) |
Executor Metrics
Executor-level metrics are sent from each executor to the driver as part of the Heartbeat:- Memory Metrics
- Task Metrics
- GC Metrics
- Peak Memory Metrics
memoryUsed- Storage memory used by this executordiskUsed- Disk space used for RDD storagemaxMemory- Total amount of memory available for storageusedOnHeapStorageMemory- Used on heap memory for storageusedOffHeapStorageMemory- Used off heap memory for storage
Metrics System
Spark has a configurable metrics system based on the Dropwizard Metrics Library. This allows you to report Spark metrics to a variety of sinks including HTTP, JMX, and CSV files.Configuration
The metrics system is configured via a configuration file at$SPARK_HOME/conf/metrics.properties. A custom file location can be specified via the spark.metrics.conf configuration property.
Instead of using the configuration file, you can use configuration parameters with prefix
spark.metrics.conf.Namespace Configuration
By default, the root namespace used for driver or executor metrics is the value ofspark.app.id. For tracking metrics across apps, use spark.metrics.namespace:
Metrics Instances
Spark’s metrics are decoupled into different instances:master- The Spark standalone master processapplications- A component within the master which reports on various applicationsworker- A Spark standalone worker processexecutor- A Spark executordriver- The Spark driver processshuffleService- The Spark shuffle serviceapplicationMaster- The Spark ApplicationMaster when running on YARN
Available Sinks
Each instance can report to zero or more sinks:ConsoleSink
Logs metrics information to the console
CSVSink
Exports metrics data to CSV files at regular intervals
JmxSink
Registers metrics for viewing in a JMX console
MetricsServlet
Adds a servlet within the existing Spark UI to serve metrics data as JSON
PrometheusServlet
Adds a servlet to serve metrics data in Prometheus format (Experimental)
GraphiteSink
Sends metrics to a Graphite node
Slf4jSink
Sends metrics to slf4j as log entries
StatsdSink
Sends metrics to a StatsD node
Prometheus Endpoints
The Prometheus Servlet mirrors the JSON data in time-series format:| Component | Port | JSON Endpoint | Prometheus Endpoint |
|---|---|---|---|
| Master | 8080 | /metrics/master/json/ | /metrics/master/prometheus/ |
| Master | 8080 | /metrics/applications/json/ | /metrics/applications/prometheus/ |
| Worker | 8081 | /metrics/json/ | /metrics/prometheus/ |
| Driver | 4040 | /metrics/json/ | /metrics/prometheus/ |
| Driver | 4040 | /api/v1/applications/{id}/executors/ | /metrics/executors/prometheus/ |
Example Configuration
Using Spark configuration parameters for a Graphite sink:Advanced Instrumentation
Custom Metrics
You can define custom metrics sources to instrument your Spark applications with application-specific metrics.External Monitoring
Spark’s metrics can be integrated with external monitoring systems through:- Prometheus: Use the PrometheusServlet for scraping metrics
- Graphite: Configure GraphiteSink to push metrics
- JMX: Use JmxSink for JMX-based monitoring tools
- StatsD: Configure StatsdSink for StatsD integration
