Apache Spark Connect Client for Swift

Apache Spark™ Connect for Swift is a subproject of Apache Spark and aims to provide a modern Swift library to enable Swift developers to leverage the power of Apache Spark for distributed data processing, machine learning, and analytical workloads directly from their Swift applications. For example, a user can develop and ship a lightweight Swift-based SparkPi app.

Docker Image Size

Name	Image Size
`apache/spark:4.1.1-python3`-based SparkPi
`pyspark-connect`-based SparkPi
`Swift`-based SparkPi

Resources

Requirement

So far, this library project is tracking the upstream changes of Apache Arrow project's Swift-support.

How to use in your apps

Create a Swift project.

mkdir SparkConnectSwiftApp
cd SparkConnectSwiftApp
swift package init --name SparkConnectSwiftApp --type executable

Add SparkConnect package to the dependency like the following

$ cat Package.swift
import PackageDescription

let package = Package(
  name: "SparkConnectSwiftApp",
  platforms: [
    .macOS(.v15)
  ],
  dependencies: [
    .package(url: "https://github.com/apache/spark-connect-swift.git", branch: "main")
  ],
  targets: [
    .executableTarget(
      name: "SparkConnectSwiftApp",
      dependencies: [.product(name: "SparkConnect", package: "spark-connect-swift")]
    )
  ]
)

Use SparkSession of SparkConnect module in Swift.

$ cat Sources/main.swift

import SparkConnect

let spark = try await SparkSession.builder.getOrCreate()
print("Connected to Apache Spark \(await spark.version) Server")

let statements = [
  "DROP TABLE IF EXISTS t",
  "CREATE TABLE IF NOT EXISTS t(a INT) USING ORC",
  "INSERT INTO t VALUES (1), (2), (3)",
]

for s in statements {
  print("EXECUTE: \(s)")
  _ = try await spark.sql(s).count()
}
print("SELECT * FROM t")
try await spark.sql("SELECT * FROM t").cache().show()

try await spark.range(10).filter("id % 2 == 0").write.mode("overwrite").orc("/tmp/orc")
try await spark.read.orc("/tmp/orc").show()

await spark.stop()

Run your Swift application.

$ swift run
...
Connected to Apache Spark 4.1.1 Server
EXECUTE: DROP TABLE IF EXISTS t
EXECUTE: CREATE TABLE IF NOT EXISTS t(a INT) USING ORC
EXECUTE: INSERT INTO t VALUES (1), (2), (3)
SELECT * FROM t
+---+
|  a|
+---+
|  1|
|  3|
|  2|
+---+

+---+
| id|
+---+
|  6|
|  8|
|  4|
|  2|
|  0|
+---+

You can find more complete examples including Spark SQL REPL, Web Server and Streaming applications in the Examples directory.

This library also supports SPARK_REMOTE environment variable to specify the Spark Connect connection string in order to provide more options.

Name		Name	Last commit message	Last commit date
Latest commit History 347 Commits
.github		.github
Examples		Examples
Sources/SparkConnect		Sources/SparkConnect
Tests/SparkConnectTests		Tests/SparkConnectTests
dev		dev
.asf.yaml		.asf.yaml
.gitignore		.gitignore
.markdownlint.yaml		.markdownlint.yaml
.markdownlintignore		.markdownlintignore
.spi.yml		.spi.yml
LICENSE		LICENSE
NOTICE		NOTICE
Package.swift		Package.swift
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Apache Spark Connect Client for Swift

Resources

Requirement

How to use in your apps

About

Uh oh!

Releases 6

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Apache Spark Connect Client for Swift

Resources

Requirement

How to use in your apps

About

Topics

Resources

License

Code of conduct

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 6

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages