SynapseApacheIceBergExperiment

A space to collaborate regarding experiences of using Apache Iceberg format with Synapse Spark Pools.

Steps

Add the Icebeberg .jar to Synapse

You need to download the Iceberg Jar and add it to Synapse.

Follow instructions from here: https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-manage-pool-packages

I'm using this file: iceberg-spark-runtime-3.2_2.12-0.13.2.jar

Create a linked service to the storage Account

You can find some guidance here: https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/develop-tables-external-tables?tabs=hadoop#create-and-query-external-tables-from-a-file-in-azure-data-lake

Basically you need to have your Storage Account as shown here:

Run the following cells

Following the steps from https://iceberg.apache.org/docs/latest/getting-started/ create the following cells in a Synapse Notebook

%%configure -f
{
    "conf":{
        "spark.sql.extensions":"org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions",
        "spark.sql.catalog.spark_catalog":"org.apache.iceberg.spark.SparkSessionCatalog",
        "spark.sql.catalog.spark_catalog.type":"hive",
        "spark.sql.catalog.local2":"org.apache.iceberg.spark.SparkCatalog",
        "spark.sql.catalog.local2.type":"hadoop",
        "spark.sql.catalog.local2.warehouse":"abfss://[email protected]/db"
    }
}

(Remember to replace container@storage and the folder /db) Then you can create a table

%%sql
-- local is the path-based catalog defined above
CREATE TABLE local2.db.table (id bigint, data string) USING iceberg

Then insert some records:

%%sql
INSERT INTO local2.db.table VALUES (1, 'a'), (2, 'b'), (3, 'c');

And finally check the contents of the table:

%%sql
select * from local2.db.table

This will create the table in the Storage Account: abfss://[email protected]/db/table2

And there you can find two folders data and metadata. You can guess what is inside, right? ;)

Feel free to contribute with any ideas!

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
media		media
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SynapseApacheIceBergExperiment

Steps

Add the Icebeberg .jar to Synapse

Create a linked service to the storage Account

Run the following cells

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

SynapseApacheIceBergExperiment

Steps

Add the Icebeberg .jar to Synapse

Create a linked service to the storage Account

Run the following cells

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Packages