Skip to main content
Spark provides comprehensive security features to protect your cluster and data. However, security features like authentication are not enabled by default. When deploying a cluster that is open to the internet or an untrusted network, it’s critical to secure access properly.
Spark is not secure by default. You must explicitly enable and configure security features based on your deployment environment.

Security Overview

Spark supports multiple security mechanisms:

RPC Authentication

Authenticate communication between Spark processes

Network Encryption

Encrypt data in transit using TLS or AES

Web UI Security

Control access to Spark web interfaces

Local Storage Encryption

Encrypt temporary data written to disk

Kerberos Integration

Integrate with enterprise authentication systems

ACLs

Control who can view and modify applications
Not all deployment types support all security features. Check your deployment-specific documentation for supported security options.

RPC Authentication

Spark supports authentication for RPC channels using a shared secret. This ensures only authorized processes can communicate within your Spark cluster.

Enabling Authentication

Enable authentication by setting:
spark.authenticate=true

Deployment-Specific Authentication

For Spark on YARN, Spark automatically handles generating and distributing the shared secret. Each application uses a unique shared secret.
This feature relies on YARN RPC encryption being enabled for secure distribution of secrets.
YARN-specific configuration:
PropertyDefaultDescription
spark.yarn.shuffle.server.recovery.disabledfalseSet to true for applications with higher security requirements. Secret won’t be saved in the database, but shuffle data won’t be recovered after External Shuffle Service restarts.

REST Submission Server Authentication

The REST Submission Server supports HTTP Authorization header with JSON Web Token (JWT):
# On Spark Master
spark.master.rest.filters=org.apache.spark.ui.JWSFilter
spark.org.apache.spark.ui.JWSFilter.param.secretKey=BASE64URL-ENCODED-KEY
Clients must provide an HTTP Authorization header containing a JWT signed by the shared secret key.

Network Encryption

Spark supports two mutually exclusive forms of encryption for RPC connections:

AES Encryption Configuration

PropertyDefaultDescription
spark.network.crypto.enabledfalseEnable AES-based RPC encryption, including the new authentication protocol added in 2.2.0.
spark.network.crypto.cipherAES/CTR/NoPaddingCipher mode to use. Recommended: AES/GCM/NoPadding (authenticated encryption).
spark.network.crypto.authEngineVersion1Version of AES-based RPC encryption (1 or 2). Version 2 is recommended.
spark.network.crypto.saslFallbacktrueWhether to fall back to SASL authentication for old shuffle services.
spark.authenticate.enableSaslEncryptionfalseEnable SASL-based encrypted communication (deprecated).

Local Storage Encryption

Spark supports encrypting temporary data written to local disks, including shuffle files, shuffle spills, and cached data blocks.
Local storage encryption does not cover output data generated by applications with APIs like saveAsHadoopFile or saveAsTable.

Enabling Storage Encryption

spark.io.encryption.enabled=true
spark.io.encryption.keySizeBits=256
spark.io.encryption.keygen.algorithm=HmacSHA1
It’s strongly recommended that RPC encryption be enabled when using local storage encryption.

Storage Encryption Configuration

PropertyDefaultDescription
spark.io.encryption.enabledfalseEnable local disk I/O encryption. Supported by all deployment modes.
spark.io.encryption.keySizeBits128IO encryption key size in bits. Supported values: 128, 192, 256.
spark.io.encryption.keygen.algorithmHmacSHA1Algorithm to use when generating the IO encryption key.

Web UI Security

Authentication and Authorization

Spark supports access control to the Web UI when an authentication filter is present.
Spark does not provide built-in authentication filters. You must implement or use a third-party filter.

Access Control Lists (ACLs)

Spark differentiates between two types of permissions:
  • View permissions: Who can see the application’s UI
  • Modify permissions: Who can kill jobs in a running application
Configuration:
# Enable ACLs
spark.acls.enable=true

# Admin users (view and modify access)
spark.admin.acls=user1,user2
spark.admin.acls.groups=admin-group

# View-only access
spark.ui.view.acls=viewer1,viewer2
spark.ui.view.acls.groups=viewer-group

# Modify access
spark.modify.acls=operator1,operator2
spark.modify.acls.groups=operator-group
Use a wildcard (*) to give all users the respective privilege. By default, only the user submitting the application is added to the ACLs.

Web UI ACL Configuration

PropertyDefaultDescription
spark.acls.enablefalseWhether UI ACLs should be enabled. Requires authentication filter to be installed.
spark.admin.aclsNoneComma-separated list of users with view and modify access.
spark.admin.acls.groupsNoneComma-separated list of groups with view and modify access.
spark.ui.view.aclsNoneComma-separated list of users with view access.
spark.ui.view.acls.groupsNoneComma-separated list of groups with view access.
spark.modify.aclsNoneComma-separated list of users with modify access.
spark.modify.acls.groupsNoneComma-separated list of groups with modify access.
spark.user.groups.mappingShellBasedGroupsMappingProviderGroup mapping service for determining user groups.

History Server ACLs

spark.history.ui.acls.enable=true
spark.history.ui.admin.acls=admin1,admin2
spark.history.ui.admin.acls.groups=admin-group

SSL Configuration

SSL configuration is organized hierarchically. You can configure default SSL settings that apply to all protocols unless overridden.

SSL Namespaces

NamespaceComponent
spark.sslDefault SSL configuration for all namespaces
spark.ssl.uiSpark application Web UI
spark.ssl.standaloneStandalone Master / Worker Web UI
spark.ssl.historyServerHistory Server Web UI
spark.ssl.rpcSpark RPC communication
All settings are inherited from the default namespace, except for spark.ssl.rpc.enabled which must be explicitly set.

Common SSL Configuration

PropertyDefaultDescriptionNamespaces
${ns}.enabledfalseEnables SSL. Requires ${ns}.protocol to be set.ui,standalone,historyServer,rpc
${ns}.protocolNoneTLS protocol to use (e.g., TLSv1.2, TLSv1.3).ui,standalone,historyServer,rpc
${ns}.keyStoreNonePath to the key store file.ui,standalone,historyServer,rpc
${ns}.keyStorePasswordNonePassword to the key store.ui,standalone,historyServer,rpc
${ns}.keyPasswordNonePassword to the private key in the key store.ui,standalone,historyServer,rpc
${ns}.trustStoreNonePath to the trust store file.ui,standalone,historyServer,rpc
${ns}.trustStorePasswordNonePassword for the trust store.ui,standalone,historyServer,rpc

SSL Configuration Example

# Default SSL settings
spark.ssl.enabled=true
spark.ssl.protocol=TLSv1.3
spark.ssl.keyStore=/path/to/keystore.jks
spark.ssl.keyStorePassword=changeit
spark.ssl.trustStore=/path/to/truststore.jks
spark.ssl.trustStorePassword=changeit

# RPC-specific (must be explicitly enabled)
spark.ssl.rpc.enabled=true

Generating Key Stores

Generate key stores using the keytool program:
1

Generate Key Pair

keytool -genkeypair -alias spark -keyalg RSA -keysize 2048 \
  -keystore keystore.jks -storepass changeit
2

Export Public Key

keytool -exportcert -alias spark -keystore keystore.jks \
  -file spark.cer -storepass changeit
3

Import to Trust Store

keytool -importcert -alias spark -file spark.cer \
  -keystore truststore.jks -storepass changeit
4

Distribute Files

Distribute key stores and trust stores to all nodes in your cluster.

HTTP Security Headers

Spark can include HTTP headers to prevent Cross-Site Scripting (XSS) and other attacks:
# XSS Protection
spark.ui.xXssProtection=1; mode=block

# Content Type Options
spark.ui.xContentTypeOptions.enabled=true

# Strict Transport Security (HTTPS only)
spark.ui.strictTransportSecurity=max-age=31536000; includeSubDomains
PropertyDefaultDescription
spark.ui.xXssProtection1; mode=blockValue for HTTP X-XSS-Protection response header.
spark.ui.xContentTypeOptions.enabledtrueWhen enabled, X-Content-Type-Options is set to “nosniff”.
spark.ui.strictTransportSecurityNoneValue for HTTP Strict Transport Security (HSTS). Only used when SSL/TLS is enabled.

Kerberos Integration

Spark supports Kerberos authentication for deployments in secure Hadoop environments.

Kerberos Overview

1

Login to KDC

Use kinit to obtain credentials from the configured KDC.
2

Delegation Tokens

Spark automatically obtains delegation tokens for HDFS, Hive, and HBase (if in classpath and properly configured).
3

Token Renewal

For long-running applications, Spark can automatically renew tokens using a keytab or ticket cache.

Supported Services

Spark ships with delegation token support for:
  • HDFS and other Hadoop filesystems
  • Hive (when hive.metastore.uris is not empty)
  • HBase (when hbase.security.authentication=kerberos)

Kerberos Configuration

PropertyDefaultDescription
spark.security.credentials.${service}.enabledtrueControls whether to obtain credentials for services when security is enabled.
spark.kerberos.access.hadoopFileSystems(none)Comma-separated list of secure Hadoop filesystems your application will access.
Example:
spark.kerberos.access.hadoopFileSystems=hdfs://nn1.com:8032,hdfs://nn2.com:8032,webhdfs://nn3.com:50070

Long-Running Applications

For applications that run longer than the maximum delegation token lifetime:
Provide Spark with a principal and keytab:
spark-submit \
  --principal user@REALM \
  --keytab /path/to/user.keytab \
  --class com.example.MyApp \
  myapp.jar
In cluster mode, the keytab is copied to the driver machine. Ensure YARN and HDFS are secured with encryption.

Kubernetes with Kerberos

Three methods to submit Kerberos jobs on Kubernetes:
kinit -kt <keytab_file> <username>/<krb5 realm>
spark-submit \
  --deploy-mode cluster \
  --master k8s://<endpoint> \
  --conf spark.kubernetes.kerberos.krb5.path=/etc/krb5.conf \
  local:///opt/spark/examples/jars/spark-examples.jar
In all cases, define HADOOP_CONF_DIR or spark.kubernetes.hadoop.configMapName and ensure the KDC is visible from inside containers.

Configuring Network Ports

Generally, Spark services are private and should only be accessible within your organization’s network. Limit access to hosts and ports used by Spark services.

Standalone Mode Ports

FromToDefault PortPurposeConfiguration
BrowserStandalone Master8080Web UIspark.master.ui.port
BrowserStandalone Worker8081Web UIspark.worker.ui.port
Driver/WorkerStandalone Master7077Submit job/Join clusterSPARK_MASTER_PORT
External ServiceStandalone Master6066REST APIspark.master.rest.port
MasterWorker(random)Schedule executorsSPARK_WORKER_PORT

All Cluster Managers

FromToDefault PortPurposeConfiguration
BrowserApplication4040Web UIspark.ui.port
BrowserHistory Server18080Web UIspark.history.ui.port
Executor/MasterDriver(random)Connect to applicationspark.driver.port
Executor/DriverExecutor/Driver(random)Block Manager portspark.blockManager.port

Securing Ports with JWT

Spark supports HTTP Authorization header with JWT for all UI ports:
spark.ui.filters=org.apache.spark.ui.JWSFilter
spark.org.apache.spark.ui.JWSFilter.param.secretKey=BASE64URL-ENCODED-KEY

Security Best Practices

Always enable RPC authentication (spark.authenticate=true) in production environments, especially on multi-tenant clusters.
Enable TLS/SSL encryption for RPC and Web UI communication. Use version 2 of AES encryption if TLS is not feasible.
Enable local storage encryption (spark.io.encryption.enabled=true) for sensitive data.
Set up proper access control lists to restrict who can view and modify applications.
Set event log directory permissions to drwxrwxrwxt to prevent unauthorized access.
Integrate with Kerberos in enterprise environments for centralized authentication.
Regularly review security configurations and update to latest Spark versions with security patches.

Event Logging Security

If your applications use event logging, secure the log directory:
# Create directory with proper permissions
mkdir -p /var/log/spark-events
chmod 1777 /var/log/spark-events
chown spark:spark /var/log/spark-events
Permissions drwxrwxrwxt allow all users to write but prevent unprivileged users from reading, removing, or renaming files they don’t own.

Next Steps

Spark Configuration

Explore general Spark configuration options

Performance Tuning

Optimize your secured Spark applications

Build docs developers (and LLMs) love