Apache Spark™ Connect for Swift is a subproject of Apache Spark and aims to provide a modern Swift library to enable Swift developers to leverage the power of Apache Spark for distributed data processing, machine learning, and analytical workloads directly from their Swift applications. For example, a user can develop and ship a lightweight Swift-based SparkPi app.
Docker Image Size
| Name | Image Size |
|---|---|
apache/spark:4.1.1-python3-based SparkPi |
|
pyspark-connect-based SparkPi |
|
Swift-based SparkPi |
- Apache Spark 4.1.1 (January 2026)
- Swift 6.2 (September 2025)
- gRPC Swift 2.3 (March 2026)
- gRPC Swift Protobuf 2.2.1 (March 2026)
- gRPC Swift NIO Transport 2.6.0 (March 2026)
- FlatBuffers v25.12.19 (February 2026)
- Apache Arrow Swift
So far, this library project is tracking the upstream changes of Apache Arrow project's Swift-support.
Create a Swift project.
mkdir SparkConnectSwiftApp
cd SparkConnectSwiftApp
swift package init --name SparkConnectSwiftApp --type executableAdd SparkConnect package to the dependency like the following
$ cat Package.swift
import PackageDescription
let package = Package(
name: "SparkConnectSwiftApp",
platforms: [
.macOS(.v15)
],
dependencies: [
.package(url: "https://github.com/apache/spark-connect-swift.git", branch: "main")
],
targets: [
.executableTarget(
name: "SparkConnectSwiftApp",
dependencies: [.product(name: "SparkConnect", package: "spark-connect-swift")]
)
]
)Use SparkSession of SparkConnect module in Swift.
$ cat Sources/main.swift
import SparkConnect
let spark = try await SparkSession.builder.getOrCreate()
print("Connected to Apache Spark \(await spark.version) Server")
let statements = [
"DROP TABLE IF EXISTS t",
"CREATE TABLE IF NOT EXISTS t(a INT) USING ORC",
"INSERT INTO t VALUES (1), (2), (3)",
]
for s in statements {
print("EXECUTE: \(s)")
_ = try await spark.sql(s).count()
}
print("SELECT * FROM t")
try await spark.sql("SELECT * FROM t").cache().show()
try await spark.range(10).filter("id % 2 == 0").write.mode("overwrite").orc("/tmp/orc")
try await spark.read.orc("/tmp/orc").show()
await spark.stop()Run your Swift application.
$ swift run
...
Connected to Apache Spark 4.1.1 Server
EXECUTE: DROP TABLE IF EXISTS t
EXECUTE: CREATE TABLE IF NOT EXISTS t(a INT) USING ORC
EXECUTE: INSERT INTO t VALUES (1), (2), (3)
SELECT * FROM t
+---+
| a|
+---+
| 1|
| 3|
| 2|
+---+
+---+
| id|
+---+
| 6|
| 8|
| 4|
| 2|
| 0|
+---+You can find more complete examples including Spark SQL REPL, Web Server and Streaming applications in the Examples directory.
This library also supports SPARK_REMOTE environment variable to specify the Spark Connect connection string in order to provide more options.