Migration Guide Overview

This section provides comprehensive migration guides for each Apache Spark component to help you effectively migrate your applications across versions. Each guide covers breaking changes, deprecations, and behavior modifications.

Available Migration Guides

Spark’s migration guides are organized by component:

Core Components

Spark Core

RDD APIs, scheduling, storage, and runtime behavior changes

Spark SQL

SQL, DataFrames, and Dataset API modifications

Machine Learning

MLlib

Machine learning algorithms, pipelines, and model changes

Language-Specific APIs

PySpark

Python API changes and behavior updates

SparkR

R API changes and behavior updates

Migration Strategy

When upgrading Spark versions, follow this recommended approach:

1. Review Breaking Changes

Start by reviewing the breaking changes section for your target version. These changes require code modifications and may cause your application to fail if not addressed.

Breaking changes are incompatible modifications that require you to update your application code before upgrading.

2. Check Deprecations

Identify deprecated APIs and plan to replace them with recommended alternatives. While deprecated features still work in the current version, they will be removed in future releases.

3. Test Behavior Changes

Some changes modify existing behavior without breaking APIs. Test your application thoroughly to ensure results remain consistent or adjust your code accordingly.

4. Update Dependencies

Ensure all external libraries and connectors are compatible with your target Spark version.

Version-Specific Considerations

Upgrading to Spark 4.0

Spark 4.0 includes several major changes:

ANSI SQL mode is enabled by default - Set spark.sql.ansi.enabled=false to restore previous behavior
Default table provider changed - CREATE TABLE without USING now defaults to the value in spark.sql.sources.default instead of Hive
Java 17 is required - JDK 8 and 11 are no longer supported
Hadoop 3.3.6+ is required - Earlier Hadoop versions are not supported

Spark 4.0 removes support for Apache Mesos as a resource manager. If you’re using Mesos, plan to migrate to YARN, Kubernetes, or Standalone mode.

Upgrading to Spark 3.0

Spark 3.0 was a major release with significant changes:

Adaptive Query Execution (AQE) is enabled by default in 3.2+
Proleptic Gregorian calendar replaced the hybrid calendar for date/timestamp operations
Built-in Hive upgraded from 1.2 to 2.3

Configuration Changes

Many behavior changes can be reverted using legacy configuration flags. However, we recommend adapting to new behaviors rather than relying on legacy modes, as these flags may be removed in future versions.

Example: Restoring Legacy Behavior

# Spark 4.0: Disable ANSI mode if needed
spark.conf.set("spark.sql.ansi.enabled", "false")

# Spark 3.0: Use legacy datetime parsing
spark.conf.set("spark.sql.legacy.timeParserPolicy", "LEGACY")

Cross-Version Compatibility

If you need to maintain code that works across multiple Spark versions:

Check Spark version at runtime

val sparkVersion = spark.version
if (sparkVersion.startsWith("3.")) {
  // Use Spark 3.x APIs
} else if (sparkVersion.startsWith("4.")) {
  // Use Spark 4.x APIs
}

Use configuration flags judiciously - Set legacy flags only when necessary
Test thoroughly - Run your test suite against all supported Spark versions

Additional Resources

Release Notes

Detailed release notes for each Spark version

API Documentation

Complete API reference for all Spark APIs

Getting Help

If you encounter migration issues:

Check the Spark JIRA for known issues
Ask questions on the Spark user mailing list
Review Stack Overflow for community solutions

Version Migration

Migration Guide Overview

Available Migration Guides

Core Components

Spark Core

Spark SQL

Machine Learning

MLlib

Language-Specific APIs

PySpark

SparkR

Migration Strategy

1. Review Breaking Changes

2. Check Deprecations

3. Test Behavior Changes

4. Update Dependencies

Version-Specific Considerations

Upgrading to Spark 4.0

Upgrading to Spark 3.0

Configuration Changes

Example: Restoring Legacy Behavior

Cross-Version Compatibility

Additional Resources

Release Notes

API Documentation

Getting Help

Build docs developers (and LLMs) love

Version Migration

​Available Migration Guides

​Core Components

Spark Core

Spark SQL

​Machine Learning

MLlib

​Language-Specific APIs

PySpark

SparkR

​Migration Strategy

​1. Review Breaking Changes

​2. Check Deprecations

​3. Test Behavior Changes

​4. Update Dependencies

​Version-Specific Considerations

​Upgrading to Spark 4.0

​Upgrading to Spark 3.0

​Configuration Changes

​Example: Restoring Legacy Behavior

​Cross-Version Compatibility

​Additional Resources

Release Notes

API Documentation

​Getting Help

Build docs developers (and LLMs) love

Available Migration Guides

Core Components

Machine Learning

Language-Specific APIs

Migration Strategy

1. Review Breaking Changes

2. Check Deprecations

3. Test Behavior Changes

4. Update Dependencies

Version-Specific Considerations

Upgrading to Spark 4.0

Upgrading to Spark 3.0

Configuration Changes

Example: Restoring Legacy Behavior

Cross-Version Compatibility

Additional Resources

Getting Help