API Documentation
Access the complete Scala API documentation (Scaladoc) at: Spark Scala API (Scaladoc)Core Packages
org.apache.spark.sql
The main package for working with structured data. You’ll use this package for most DataFrame and Dataset operations. Key Classes:- SparkSession - Entry point for Spark functionality. Use this to create DataFrames, read data, and configure Spark.
- Dataset[T] - Strongly-typed distributed collection of data. Provides type-safe operations.
- DataFrame - Type alias for
Dataset[Row]. Use for semi-structured data. - Column - Represents a column in a DataFrame.
- Row - Represents a row of data.
- functions - Built-in functions for DataFrame operations.
org.apache.spark.sql.types
Data types for Spark SQL schemas. Key Classes:- DataType - Base type for all data types
- StructType - Schema definition for DataFrames
- StructField - Field in a StructType schema
org.apache.spark.sql.streaming
Structured Streaming API for processing real-time data streams. Key Classes:- DataStreamReader - Read streaming data sources
- DataStreamWriter - Write streaming data to sinks
- StreamingQuery - Handle to a running streaming query
- StreamingQueryListener - Monitor streaming query events
org.apache.spark.sql.catalog
Manage metadata for databases, tables, functions, and views. Key Classes:- Catalog - Interface for catalog operations
org.apache.spark
Core Spark functionality (Note: SparkContext and RDD are not supported in Spark Connect). Key Classes:- SparkContext - Main entry point for Spark Classic (not available in Spark Connect)
- SparkConf - Configuration for Spark applications
Quick Start Example
Here’s a simple example to get you started with the Scala API:Working with Datasets
Datasets provide type-safe operations with compile-time checking:User-Defined Functions (UDFs)
Create custom functions for your transformations:Spark Connect Support
Since Spark 3.5, most Scala APIs are supported in Spark Connect, including Dataset, functions, Column, Catalog, and streaming APIs. However, SparkContext and RDD are not supported.
remote parameter:
Additional Resources
For the most up-to-date API documentation, always refer to the official Scaladoc linked at the top of this page.
