Security Overview
Spark supports multiple security mechanisms:RPC Authentication
Authenticate communication between Spark processes
Network Encryption
Encrypt data in transit using TLS or AES
Web UI Security
Control access to Spark web interfaces
Local Storage Encryption
Encrypt temporary data written to disk
Kerberos Integration
Integrate with enterprise authentication systems
ACLs
Control who can view and modify applications
Not all deployment types support all security features. Check your deployment-specific documentation for supported security options.
RPC Authentication
Spark supports authentication for RPC channels using a shared secret. This ensures only authorized processes can communicate within your Spark cluster.Enabling Authentication
Enable authentication by setting:Deployment-Specific Authentication
- YARN
- Kubernetes
- Standalone
For Spark on YARN, Spark automatically handles generating and distributing the shared secret. Each application uses a unique shared secret.YARN-specific configuration:
This feature relies on YARN RPC encryption being enabled for secure distribution of secrets.
| Property | Default | Description |
|---|---|---|
spark.yarn.shuffle.server.recovery.disabled | false | Set to true for applications with higher security requirements. Secret won’t be saved in the database, but shuffle data won’t be recovered after External Shuffle Service restarts. |
REST Submission Server Authentication
The REST Submission Server supports HTTPAuthorization header with JSON Web Token (JWT):
Authorization header containing a JWT signed by the shared secret key.
Network Encryption
Spark supports two mutually exclusive forms of encryption for RPC connections:- TLS/SSL Encryption (Recommended)
- AES-based Encryption (Legacy)
The preferred method uses TLS encryption via Netty’s support for SSL. This is standardized and considered more secure.Key benefits:Configuration:
- Industry-standard protocol
- Better security library
- Compliance with security policies
SSL is not automatically enabled even if
spark.ssl.enabled is set. You must explicitly enable it for RPC.AES Encryption Configuration
| Property | Default | Description |
|---|---|---|
spark.network.crypto.enabled | false | Enable AES-based RPC encryption, including the new authentication protocol added in 2.2.0. |
spark.network.crypto.cipher | AES/CTR/NoPadding | Cipher mode to use. Recommended: AES/GCM/NoPadding (authenticated encryption). |
spark.network.crypto.authEngineVersion | 1 | Version of AES-based RPC encryption (1 or 2). Version 2 is recommended. |
spark.network.crypto.saslFallback | true | Whether to fall back to SASL authentication for old shuffle services. |
spark.authenticate.enableSaslEncryption | false | Enable SASL-based encrypted communication (deprecated). |
Local Storage Encryption
Spark supports encrypting temporary data written to local disks, including shuffle files, shuffle spills, and cached data blocks.Local storage encryption does not cover output data generated by applications with APIs like
saveAsHadoopFile or saveAsTable.Enabling Storage Encryption
Storage Encryption Configuration
| Property | Default | Description |
|---|---|---|
spark.io.encryption.enabled | false | Enable local disk I/O encryption. Supported by all deployment modes. |
spark.io.encryption.keySizeBits | 128 | IO encryption key size in bits. Supported values: 128, 192, 256. |
spark.io.encryption.keygen.algorithm | HmacSHA1 | Algorithm to use when generating the IO encryption key. |
Web UI Security
Authentication and Authorization
Spark supports access control to the Web UI when an authentication filter is present.Spark does not provide built-in authentication filters. You must implement or use a third-party filter.
Access Control Lists (ACLs)
Spark differentiates between two types of permissions:- View permissions: Who can see the application’s UI
- Modify permissions: Who can kill jobs in a running application
Use a wildcard (
*) to give all users the respective privilege. By default, only the user submitting the application is added to the ACLs.Web UI ACL Configuration
| Property | Default | Description |
|---|---|---|
spark.acls.enable | false | Whether UI ACLs should be enabled. Requires authentication filter to be installed. |
spark.admin.acls | None | Comma-separated list of users with view and modify access. |
spark.admin.acls.groups | None | Comma-separated list of groups with view and modify access. |
spark.ui.view.acls | None | Comma-separated list of users with view access. |
spark.ui.view.acls.groups | None | Comma-separated list of groups with view access. |
spark.modify.acls | None | Comma-separated list of users with modify access. |
spark.modify.acls.groups | None | Comma-separated list of groups with modify access. |
spark.user.groups.mapping | ShellBasedGroupsMappingProvider | Group mapping service for determining user groups. |
History Server ACLs
SSL Configuration
SSL configuration is organized hierarchically. You can configure default SSL settings that apply to all protocols unless overridden.SSL Namespaces
| Namespace | Component |
|---|---|
spark.ssl | Default SSL configuration for all namespaces |
spark.ssl.ui | Spark application Web UI |
spark.ssl.standalone | Standalone Master / Worker Web UI |
spark.ssl.historyServer | History Server Web UI |
spark.ssl.rpc | Spark RPC communication |
All settings are inherited from the default namespace, except for
spark.ssl.rpc.enabled which must be explicitly set.Common SSL Configuration
| Property | Default | Description | Namespaces |
|---|---|---|---|
${ns}.enabled | false | Enables SSL. Requires ${ns}.protocol to be set. | ui,standalone,historyServer,rpc |
${ns}.protocol | None | TLS protocol to use (e.g., TLSv1.2, TLSv1.3). | ui,standalone,historyServer,rpc |
${ns}.keyStore | None | Path to the key store file. | ui,standalone,historyServer,rpc |
${ns}.keyStorePassword | None | Password to the key store. | ui,standalone,historyServer,rpc |
${ns}.keyPassword | None | Password to the private key in the key store. | ui,standalone,historyServer,rpc |
${ns}.trustStore | None | Path to the trust store file. | ui,standalone,historyServer,rpc |
${ns}.trustStorePassword | None | Password for the trust store. | ui,standalone,historyServer,rpc |
SSL Configuration Example
Generating Key Stores
Generate key stores using thekeytool program:
HTTP Security Headers
Spark can include HTTP headers to prevent Cross-Site Scripting (XSS) and other attacks:| Property | Default | Description |
|---|---|---|
spark.ui.xXssProtection | 1; mode=block | Value for HTTP X-XSS-Protection response header. |
spark.ui.xContentTypeOptions.enabled | true | When enabled, X-Content-Type-Options is set to “nosniff”. |
spark.ui.strictTransportSecurity | None | Value for HTTP Strict Transport Security (HSTS). Only used when SSL/TLS is enabled. |
Kerberos Integration
Spark supports Kerberos authentication for deployments in secure Hadoop environments.Kerberos Overview
Delegation Tokens
Spark automatically obtains delegation tokens for HDFS, Hive, and HBase (if in classpath and properly configured).
Supported Services
Spark ships with delegation token support for:- HDFS and other Hadoop filesystems
- Hive (when
hive.metastore.urisis not empty) - HBase (when
hbase.security.authentication=kerberos)
Kerberos Configuration
| Property | Default | Description |
|---|---|---|
spark.security.credentials.${service}.enabled | true | Controls whether to obtain credentials for services when security is enabled. |
spark.kerberos.access.hadoopFileSystems | (none) | Comma-separated list of secure Hadoop filesystems your application will access. |
Long-Running Applications
For applications that run longer than the maximum delegation token lifetime:- Using a Keytab
- Using Ticket Cache
Provide Spark with a principal and keytab:
Kubernetes with Kerberos
Three methods to submit Kerberos jobs on Kubernetes:- Local Ticket Cache
- Local Keytab
- Pre-populated Secrets
In all cases, define
HADOOP_CONF_DIR or spark.kubernetes.hadoop.configMapName and ensure the KDC is visible from inside containers.Configuring Network Ports
Generally, Spark services are private and should only be accessible within your organization’s network. Limit access to hosts and ports used by Spark services.Standalone Mode Ports
| From | To | Default Port | Purpose | Configuration |
|---|---|---|---|---|
| Browser | Standalone Master | 8080 | Web UI | spark.master.ui.port |
| Browser | Standalone Worker | 8081 | Web UI | spark.worker.ui.port |
| Driver/Worker | Standalone Master | 7077 | Submit job/Join cluster | SPARK_MASTER_PORT |
| External Service | Standalone Master | 6066 | REST API | spark.master.rest.port |
| Master | Worker | (random) | Schedule executors | SPARK_WORKER_PORT |
All Cluster Managers
| From | To | Default Port | Purpose | Configuration |
|---|---|---|---|---|
| Browser | Application | 4040 | Web UI | spark.ui.port |
| Browser | History Server | 18080 | Web UI | spark.history.ui.port |
| Executor/Master | Driver | (random) | Connect to application | spark.driver.port |
| Executor/Driver | Executor/Driver | (random) | Block Manager port | spark.blockManager.port |
Securing Ports with JWT
Spark supports HTTPAuthorization header with JWT for all UI ports:
Security Best Practices
Enable Authentication
Enable Authentication
Always enable RPC authentication (
spark.authenticate=true) in production environments, especially on multi-tenant clusters.Use TLS Encryption
Use TLS Encryption
Enable TLS/SSL encryption for RPC and Web UI communication. Use version 2 of AES encryption if TLS is not feasible.
Encrypt Local Storage
Encrypt Local Storage
Enable local storage encryption (
spark.io.encryption.enabled=true) for sensitive data.Configure ACLs
Configure ACLs
Set up proper access control lists to restrict who can view and modify applications.
Secure Event Logs
Secure Event Logs
Set event log directory permissions to
drwxrwxrwxt to prevent unauthorized access.Use Kerberos
Use Kerberos
Integrate with Kerberos in enterprise environments for centralized authentication.
Regular Security Audits
Regular Security Audits
Regularly review security configurations and update to latest Spark versions with security patches.
Event Logging Security
If your applications use event logging, secure the log directory:drwxrwxrwxt allow all users to write but prevent unprivileged users from reading, removing, or renaming files they don’t own.
Next Steps
Spark Configuration
Explore general Spark configuration options
Performance Tuning
Optimize your secured Spark applications
